Open jrt54 opened 3 years ago
I wonder if resqpy.organise.OrganizationFeature
is intended for this purpose?
For creating a new RESQML dataset containing a subset of objects, a new empty model can be created (for example with the convenience function resqpy.model.new_model()) and then the following method called repeatedly as needed: Model.copy_part_from_other_model()
That method identifies required referenced objects and recursively includes those in the copy.
In RESQML, realization is a concept specific to properties. If other objects have competing versions, then multiple Interpretation objects linked to a single Feature object can be used to organise everything, with each Representation object relating to one Interpretation object.
We also use extra metadata in some places in resqpy but code using other RESQML APIs will not know what to do with such metadata.
With regard to Property realizations specifically, the resqpy PropertyCollection class has methods for selecting a subset of properties (returning a new PropertyCollection).
This sounds great.
One follow-up question: Can this copying routine be done copying data from a large existing model with a corresponding epc file to a sort of "virtual" model_copy that does not have an epc file and hasn't been written to disk as an actual file?
A newly created Model object does not get written to disc until the store_epc() method is called. However, the copy_part... methods currently copy the hdf5 data. That is in line with our policy of having a one-to-one correspondence between epc and h5 files. With a little development, we could use existing flexibility in the underlying methods (and RESQML) to point to arrays in the existing large h5, in the subset model.
Some kind of resqpy class for iterating over realizations or specific instances of a model would be ideal for running running applications on specific subsets of data (i.e. specific realizations or with certain wells included/excluded).
Maybe this could be done with sets of UUIDs
Something like this roughly:
wells_of_interest=['wellname0', 'wellname1']
And then it would be easy to do this in a notebook or script where I implement one function called my_own_function and can test how that function performs and what the results are on different subsets of wells or different subsets/realizations of permeability.
If model.subset returned an object that could be pickled, then I /think/ that would also be enough to scale functions written like this into a loop over lists of UUID_subsets/model_subsets and scaled via e.g. dask?