gafusion / omas

Ordered Multidimensional Array Structure
http://gafusion.github.io/omas
MIT License
30 stars 14 forks source link

Reading h5 with `consistency_check = False` fails #218

Closed kripnerl closed 1 year ago

kripnerl commented 1 year ago

Example pull request: #217

https://github.com/gafusion/omas/actions/runs/3413780388/jobs/5680901424

It seems that consistency_check ensures the right type of conversions. In my case, it was failing when reading fields of type str which are stored in h5 as bytes which resulted in

test_load_omas_coil_description.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../coils/coils_io/pf_coils.py:149: in load_pf_configuration_from_ods_h5
    ds.coords["coil_label"] = ods["pf_active.coil.:.name"]
../../../anaconda3/envs/compass/lib/python3.7/site-packages/omas/omas_core.py:1322: in __getitem__
    return value.__getitem__(key[1:], cocos_and_coords)
../../../anaconda3/envs/compass/lib/python3.7/site-packages/omas/omas_core.py:1322: in __getitem__
    return value.__getitem__(key[1:], cocos_and_coords)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = [{'element': [{'geometry': {'rectangle': {'height': 0.208, 'r': 0.42, 'width': 0.095, 'z': 0.1015}}, 'turns_with_sign'...z': -0.42}}, 'turns_with_sign': 40.0}], 'identifier': b'PF4L', 'mass': 1145.0, 'name': b'PF4L', 'resistance': 21910.0}]
key = [slice(None, None, None), 'name'], cocos_and_coords = True

    def __getitem__(self, key, cocos_and_coords=True):
        """
        ODS getitem method allows support for different syntaxes to access data

        :param key: different syntaxes to access data, for example:
              * ods['equilibrium']['time_slice'][0]['profiles_2d'][0]['psi']   # standard Python dictionary syntax
              * ods['equilibrium.time_slice[0].profiles_2d[0].psi']            # IMAS hierarchical tree syntax
              * ods['equilibrium.time_slice.0.profiles_2d.0.psi']              # dot separated string syntax
              * ods[['equilibrium','time_slice',0,'profiles_2d',0,'psi']]      # list of nodes syntax

            NOTE: Python3.6+ f-strings can be very handy when looping over arrays of structures. For example:
            for time_index in range(len(ods[f'equilibrium.time_slice'])):
                for grid_index in range(len(ods[f'equilibrium.time_slice.{time_index}.profiles_2d'])):
                    print(ods[f'equilibrium.time_slice.{time_index}.profiles_2d.{grid_index}.psi'])

        :param cocos_and_coords: processing of cocos transforms and coordinates interpolations [True/False/None]
              * True: enabled COCOS and enabled interpolation
              * False: enabled COCOS and disabled interpolation
              * None: disabled COCOS and disabled interpolation

        :return: ODS value
        """

        # handle pattern match
        if isinstance(key, str) and key.startswith('@'):
            key = self.search_paths(key, 1, '@')[0]

        # handle individual keys as well as full paths
        key = p2l(key)

        if not len(key):
            return self

        # negative numbers are used to address arrays of structures from the end
        if isinstance(key[0], int) and key[0] < 0:
            if self.omas_data is None:
                key[0] = 0
            elif isinstance(self.omas_data, list):
                if not len(self.omas_data):
                    key[0] = 0
                else:
                    key[0] = len(self.omas_data) + key[0]
        # '+' is used to append new entry in array structure
        if key[0] == '+':
            if self.omas_data is None:
                key[0] = 0
            elif isinstance(self.omas_data, list):
                key[0] = len(self.omas_data)
        # slice
        elif isinstance(key[0], str) and ':' in key[0]:
            key[0] = slice(*map(lambda x: int(x.strip()) if x.strip() else None, key[0].split(':')))

        dynamically_created = False

        # data slicing
        # NOTE: OMAS will try to return numpy arrays if the sliced data can be stacked in a uniform array
        # otherwise a list will be returned (that's where we do `return data0` below)
        if isinstance(key[0], slice):
            data0 = []
            for k in self.keys(dynamic=1)[key[0]]:
                try:
                    data0.append(self.__getitem__([k] + key[1:], cocos_and_coords))
                except ValueError:
                    data0.append([])
            # raise an error if no data is returned
            if not len(data0):
                raise ValueError('`%s` has no data' % self.location)

            # if they are filled but do not have the same number of dimensions
            shapes = [numpy.asarray(item).shape for item in data0 if numpy.asarray(item).size]
            if not len(shapes):
                return numpy.asarray(data0)
            if not all(len(shape) == len(shapes[0]) for shape in shapes[1:]):
                return data0

            # find maximum shape
            max_shape = []
            for shape in shapes:
                for k, s in enumerate(shape):
                    if len(max_shape) < k + 1:
                        max_shape.append(s)
                    else:
                        max_shape[k] = max(max_shape[k], s)
            max_shape = tuple([len(data0)] + max_shape)

            # find types
            dtypes = [numpy.asarray(item).dtype for item in data0 if numpy.asarray(item).size]
            if not len(dtypes):
                return numpy.asarray(data0)
            if not all(dtype.char == dtypes[0].char for dtype in dtypes[1:]):
                return data0
            dtype = dtypes[0]

            # array of strings
            if dtype.char in 'U':
                return numpy.asarray(data0)

            # define an empty array of shape max_shape
            if dtype.char in 'iIl':
                data = numpy.full(max_shape, 0)
            elif dtype.char in 'df':
                data = numpy.full(max_shape, numpy.nan)
            elif dtype.char in 'O':
                data = numpy.full(max_shape, object())
            else:
>               raise ValueError('Not an IMAS data type %s' % dtype.char)
E               ValueError: Not an IMAS data type S

The above error is raised on

ods["pf_active.coil.:.name"]

Partial fix would be, if the consitency_check method is divided into two where one is controlling IMAS typing and the other is controlling whether all fields are following IMAS structure. This would be generally very useful for our use case, where we want to have some kind of extended IMAS schema for data description. Some data (for example mass of the pf_active coil or time-dependent conductivity) are not in the structure and could benefit from having a possibility to extend the schema while using benefits of OMAS.

smithsp commented 1 year ago

I looked at this some. I did not make any progress.

orso82 commented 1 year ago

@kripnerl to extend the data schema use the extra_structures feature in OMAS https://gafusion.github.io/omas/auto_examples/extra_structures.html#sphx-glr-auto-examples-extra-structures-py

Would this solve your issue?

kripnerl commented 1 year ago

@orso82 I will check this. It could solve our issue.

github-actions[bot] commented 1 year ago

Stale issue message