bp / resqpy

Python API for working with RESQML models
https://resqpy.readthedocs.io/en/latest/
MIT License
52 stars 13 forks source link

CopyAllParts fails with invalid UUID patterns #192

Closed mgimhof closed 3 years ago

mgimhof commented 3 years ago

The little snippet fails due to invalid UUID patterns (in hdf5).

import resqpy.model as rq
model = rq.Model('d:/dev/resqpy/tests/test_data/UGRID_GRID.epc')
new_model = rq.Model('d:/new.epc', new_epc=True)
new_model.copy_all_parts_from_other_model(model)
new_model.store_epc('d:/new.epc')

It fails with:

Traceback (most recent call last):
  File "D:\DEV\resqpy\test_local\rewrite_ugrid.py", line 14, in <module>
    new_model.copy_all_parts_from_other_model(model)
  File "D:\DEV\resqpy\resqpy\model.py", line 3348, in copy_all_parts_from_other_model
    self.copy_part_from_other_model(other_model,
  File "D:\DEV\resqpy\resqpy\model.py", line 3232, in copy_part_from_other_model
    hdf5_count = whdf5.copy_h5(other_h5_file_name, self_h5_file_name, uuid_inclusion_list = [uuid], mode = 'a')
  File "D:\DEV\resqpy\resqpy\olio\write_hdf5.py", line 167, in copy_h5
    uuid = bu.uuid_from_string(group)
  File "D:\DEV\resqpy\resqpy\olio\uuid.py", line 111, in uuid_from_string

because it encounters

'Fault_4346bb9c-68a5-4591-86df-9284316d0dd3' or 'UnstructuredGrid_492f069b-888a-4a17-8cc1-cdc845774f18_Geometry'

I think the issue arises from the hdf paths:

<eml20:PathInHdfFile xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string">RESQML/UnstructuredGrid_492f069b-888a-4a17-8cc1-cdc845774f18_Geometry/USER_CELL_INDEX</eml20:PathInHdfFile> or

<eml20:PathInHdfFile xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:type="xsd:string">RESQML/Fault_4346bb9c-68a5-4591-86df-9284316d0dd3/Fault_Cells</eml20:PathInHdfFile> It could well be that a very well-known vendor is ignoring standard or business rule with regard to naming the hdf path within the h5 file. I have always treated PathInHdfFile as is rather than assuming some semantics.

ugrid.zip

andy-beer commented 3 years ago

Thanks for highlighting this @mgimhof – I think your analysis is correct. There are probably parts of resqpy code that make assumptions about the semantics or structure of hdf5 internal paths. We should treat that as a bug and modify the code so that it always reads the hdf5 internal paths from the xml and does not make any unnecessary assumptions.

mgimhof commented 3 years ago

@andy-beer Thanks Andy!

andy-beer commented 3 years ago

@mgimhof The hdf5_paths branch has changes which should respect arbitrary internal hdf5 parts when using the Model copy_part_from_other_model() or copy_all_parts_from_other_model() methods. Could you please check out the branch in its current form and see whether it resolves the issue for your example dataset.

I will keep the pull request open for now as I need to check the rest of the codebase for other unnecessary assumptions when reading hdf5 data.

mgimhof commented 3 years ago

@andy-beer yes, your branch resolves my issue with copying this dataset

andy-beer commented 3 years ago

Thanks for testing @mgimhof – the new code should support any hdf5 internal paths, as long as there is at least one hdf5 group (eg. '/RESQML/') in the path before the final hdf5 dataset name.

andy-beer commented 3 years ago

For your info, @mgimhof , the hdf5_paths branch has now been merged into master and the code tagged as v1.1.4

andy-beer commented 3 years ago

Apologies @mgimhof – I just noticed that I failed to create tag v1.1.4 yesterday – it is now in place, with the changes to support arbitrary hdf5 internal paths.