bp / resqpy

Python API for working with RESQML models
https://resqpy.readthedocs.io/en/latest/
MIT License
54 stars 13 forks source link

Base class for common attributes and methods #20

Closed connortann closed 3 years ago

connortann commented 3 years ago

We could create a base class, and inherit from this in all other resqml class objects.

This could help reduce duplication of code, and standardise common attributes and methods.

The base class could handle things such as:

It would also help make the classes more robust: we could have generic Model methods that would be compatible with any class that inherits from Base.

jrt54 commented 3 years ago

I'd also pose the suggestion of an attribute or method like "numpydata" to each (or at least some) base class, which is the underlying numpy array of data. For simple things like a single property of grid data.

I like the light-touch approach of most classes so that when I get the grid object it doesn't fetch a huge amount of data upfront. But I think you could get the best of both worlds by making the "numpydata" attribute actually some kind of a class similar to numpy arrays where the initialization only defines metadata like shape and where we override the attribute https://numpy.org/doc/stable/reference/generated/numpy.ndarray.__getitem__.html and maybe if someone calls resqmlobject.numpydata(readonly=False) an object with a special setitem attribute which basically calls low-level h5 slicing routines.

The initialization could basically be creating a special numpydata object with only metadata like shape defined, so that you can use these objects without necessarily needing to fetch all the data. But it also allows an easy and familiar numpy-like syntax for slicing. It also allows easy on-the-fly slicing of large datasets. resqmlproperty.numpydata[zidx,:,:] would return literally one z-slice of data and if my understanding of h5 is correct, it would skip redundant IO entirely.

andy-beer commented 3 years ago

Hmm, I don't feel entirely comfortable with the numpydata idea. Many RESQML classes involve several specialist hdf5 arrays, and the existing resqpy code has ways of reading and writing these arrays depending on the class. A general re-engineering of that approach would be a massive undertaking.

If we limit our consideration to numerical property objects, then our existing PropertyCollection class with methods like single_array_ref() can deliver numpy arrays to the calling code. With regard to slicing directly from hdf5, we have the existing method for doing that, though I appreciate that is at a low level.

For most of our modelling workflows, I don't think more direct access to the arrays in the hdf5 is appropriate. I see it as a specialist activity. However, I'm open to a discussion about what a better interface to property objects could look like.