Model subset and/or iterator over sets of UUIDs

jrt54 commented 3 years ago

Some kind of resqpy class for iterating over realizations or specific instances of a model would be ideal for running running applications on specific subsets of data (i.e. specific realizations or with certain wells included/excluded).

Maybe this could be done with sets of UUIDs

Something like this roughly:

wells_of_interest=['wellname0', 'wellname1']

model=resqpy.Model(epc_file=<filename>)
UUID_set=set([])
UUID_set.add( model.grid(realization=0).uuid )
UUID_set.add( model.property_uuid(title='pressure', realization=0)
for well in model.wells(): 
 if well.well_name in [ 'wellname0', 'wellname1' ] :
   UUID_set.add( well.uuid )

model_subset=model.subset(  UUID_set )  #ideally model_subset would have an attribute like model_subset.parent which points to the original model. It seems like I wouldn't want this to create a new model/epc by default yet.

my_own_function(model_subset) #assume I've created a function that computes something by looping only over all the wells, the grid, and the pressure, for example, and appends some new properties to the model_subset I pass in. Maybe it creates new properties/UUIDS attached to this model subset

model_subset.write_epc(epc_file=<new file name>) #maybe an option to write a model 'subset' seperately with only the resqml properties in my subset included
#possibly a model_subset.update(model_subset.parent_model) and model.store_epc() command if I just want to update the original epc file rather than writing new ones.

And then it would be easy to do this in a notebook or script where I implement one function called my_own_function and can test how that function performs and what the results are on different subsets of wells or different subsets/realizations of permeability.

If model.subset returned an object that could be pickled, then I /think/ that would also be enough to scale functions written like this into a loop over lists of UUID_subsets/model_subsets and scaled via e.g. dask?

connortann commented 3 years ago

I wonder if resqpy.organise.OrganizationFeature is intended for this purpose?

andy-beer commented 3 years ago

For creating a new RESQML dataset containing a subset of objects, a new empty model can be created (for example with the convenience function resqpy.model.new_model()) and then the following method called repeatedly as needed: Model.copy_part_from_other_model()

That method identifies required referenced objects and recursively includes those in the copy.

andy-beer commented 3 years ago

In RESQML, realization is a concept specific to properties. If other objects have competing versions, then multiple Interpretation objects linked to a single Feature object can be used to organise everything, with each Representation object relating to one Interpretation object.

We also use extra metadata in some places in resqpy but code using other RESQML APIs will not know what to do with such metadata.

andy-beer commented 3 years ago

With regard to Property realizations specifically, the resqpy PropertyCollection class has methods for selecting a subset of properties (returning a new PropertyCollection).

jrt54 commented 3 years ago

This sounds great.

One follow-up question: Can this copying routine be done copying data from a large existing model with a corresponding epc file to a sort of "virtual" model_copy that does not have an epc file and hasn't been written to disk as an actual file?

andy-beer commented 3 years ago

A newly created Model object does not get written to disc until the store_epc() method is called. However, the copy_part... methods currently copy the hdf5 data. That is in line with our policy of having a one-to-one correspondence between epc and h5 files. With a little development, we could use existing flexibility in the underlying methods (and RESQML) to point to arrays in the existing large h5, in the subset model.

bp / resqpy

Model subset and/or iterator over sets of UUIDs #25