DASDAE / dascore

A python library for distributed fiber optic sensing
Other
73 stars 16 forks source link

Could not index PRODML v. 2.0 format #221

Closed ahmadtourei closed 1 year ago

ahmadtourei commented 1 year ago

Description

The index file is not created in the data directory. So, I could not get the patches out of the spool. Data format: PRODML v. 2.0 format

Please note that no error occurred after getting the spool and indexing got to 100%:

import dascore as dc

sp = dc.spool(data_path)

Example

Expected behavior

Versions

d-chambers commented 1 year ago

Darn, can you share one of those files with me so I can see what's going on? Or is it the same at the last one you sent over?

ahmadtourei commented 1 year ago

Just sent. Thanks!

ahmadtourei commented 1 year ago

Index file is created. However, I got a "CoordDataError" on getting a patch out of the spool:

---------------------------------------------------------------------------
CoordDataError                            Traceback (most recent call last)
Cell In[7], line 2
      1 # get sampling rate, channel spacing, and gauge length from the first patch
----> 2 patch_0 = sp[0]
      3 gauge_length = patch_0.attrs['gauge_length']
      4 print("Gauge length = ", gauge_length)

File [~/coding/dascore/dascore/core/spool.py:176](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/~/coding/dascore/dascore/core/spool.py:176), in DataFrameSpool.__getitem__(self, item)
    175 def __getitem__(self, item):
--> 176     out = self._get_patches_from_index(item)
    177     # a single index was used, should return a single patch
    178     if not isinstance(item, slice):

File [~/coding/dascore/dascore/core/spool.py:214](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/~/coding/dascore/dascore/core/spool.py:214), in DataFrameSpool._get_patches_from_index(self, df_ind)
    212 assert not df1.empty
    213 joined = df1.join(source.drop(columns=df1.columns, errors="ignore"))
--> 214 return self._patch_from_instruction_df(joined)

File [~/coding/dascore/dascore/core/spool.py:224](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/~/coding/dascore/dascore/core/spool.py:224), in DataFrameSpool._patch_from_instruction_df(self, joined)
    221 for patch_kwargs in df_dict_list:
    222     # convert kwargs to format understood by parser/patch.select
    223     kwargs = _convert_min_max_in_kwargs(patch_kwargs, joined)
--> 224     patch = self._load_patch(kwargs)
    225     # apply any trimming needed on patch
    226     select_kwargs = {
    227         i: v
    228         for i, v in kwargs.items()
    229         if i in patch.dims or i in patch.coords.coord_map
    230     }

File [~/coding/dascore/dascore/clients/dirspool.py:134](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/~/coding/dascore/dascore/clients/dirspool.py:134), in DirectorySpool._load_patch(self, kwargs)
    132 final_kwargs = dict(kwargs)
    133 final_kwargs.update(self._select_kwargs)
--> 134 patch = dc.read(**final_kwargs)[0]
    135 return patch

File [~/coding/dascore/dascore/io/core.py:507](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/~/coding/dascore/dascore/io/core.py:507), in read(path, file_format, file_version, time, distance, **kwargs)
    505 required_type = formatter.read._required_type
    506 path = man.get_resource(required_type)
--> 507 out = formatter.read(
    508     path,
    509     file_version=file_version,
    510     time=time,
    511     distance=distance,
...
    636     )
--> 637     raise CoordDataError(msg)
    638 return data

CoordDataError: Data array has a shape of (5099, 16384) which doesnt match the coordinate manager shape of (16384, 5099).
d-chambers commented 1 year ago

Thanks for finding this! So it turns out some prodML files have time/distance and other distance/time dimension ordering. We just assumed it would always be the same.

d-chambers commented 1 year ago

Hey @ahmadtourei,

Should be fixed now. Please take it for a spin and let me know if not.

ahmadtourei commented 1 year ago

The index file is not created for these PRODML v.2.0 format and below error raised:


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 9
      6 # get the spool of data form the defined data path (will index patches for the first time)
      7 sp = dc.spool(data_path)
----> 9 print(sp)
     11 # print the contents of first 5 patches
     12 # content_df = sp.get_contents()
     13 # content_df.head()

File [~/coding/dascore/dascore/core/spool.py:65](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/2_test_latest_master_branch/~/coding/dascore/dascore/core/spool.py:65), in BaseSpool.__str__(self)
     64 def __str__(self):
---> 65     return str(self.__rich__())

File [~/coding/dascore/dascore/clients/dirspool.py:67](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/2_test_latest_master_branch/~/coding/dascore/dascore/clients/dirspool.py:67), in DirectorySpool.__rich__(self)
     65 def __rich__(self):
     66     """Augment rich string directory spool stuff."""
---> 67     base = super().__rich__()
     68     path = self.indexer.path
     69     kwargs = self._select_kwargs

File [~/coding/dascore/dascore/core/spool.py:59](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/2_test_latest_master_branch/~/coding/dascore/dascore/core/spool.py:59), in BaseSpool.__rich__(self)
     57 text += Text(self.__class__.__name__, style=self._rich_style)
     58 text += Text(" 🧵 ")
---> 59 patch_len = len(self)
     60 text += Text(f"({patch_len:d}")
     61 text += Text(" Patches)") if patch_len != 1 else Text(" Patch)")

File [~/coding/dascore/dascore/core/spool.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/2_test_latest_master_branch/~/coding/dascore/dascore/core/spool.py:326), in DataFrameSpool.__len__(self)
    325 def __len__(self):
--> 326     return len(self._df)

File [~/coding/dascore/dascore/utils/misc.py:263](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/2_test_latest_master_branch/~/coding/dascore/dascore/utils/misc.py:263), in CacheDescriptor.__get__(self, instance, owner)
    261 if self._name not in cache:
    262     func = getattr(instance, self._func_name)
--> 263     out = func(*self._args, **self._kwargs)
    264     cache[self._name] = out
    265 return cache[self._name]

File [~/coding/dascore/dascore/clients/dirspool.py:77](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/ahmadtourei/coding/final_codes_to_upload/2_test_latest_master_branch/~/coding/dascore/dascore/clients/dirspool.py:77), in DirectorySpool._get_df(self)
     74 def _get_df(self):
     75     """Get the dataframe of current contents."""
     76     out = adjust_segments(
---> 77         self._source_df, ignore_bad_kwargs=True, **self._select_kwargs
     78     )
...
    288 # takes care of other types as well as for example NROWS for
    289 # Tables and EXTDIM for EArrays
    290 format_version = self._v__format_version

AttributeError: Attribute 'RawDescription' does not exist in node: '/Acquisition/Raw[0]'
d-chambers commented 1 year ago

So is this regarding the test file named "DOSS_20220723T111500_430400Z.hdf5"? When I run this code in the same directory as that file:

import dascore as dc

spool = dc.spool(".").update()
patch = spool[0]
print(patch)

it works fine. Are you also on the current master branch? Perhaps there are other files you are using?

ahmadtourei commented 1 year ago

No, this is regarding the " BM73-22_500Hz_UTC_20230718_170000.h5". I'm on the master branch.

On Fri, Sep 1, 2023 at 3:51 PM Derrick Chambers @.***> wrote:

So is this regarding the test file named "DOSS_20220723T111500_430400Z.hdf5"? When I run this code in the same directory as that file:

import dascore as dc spool = dc.spool(".").update()patch = spool[0]print(patch)

it works fine. Are you also on the current master branch? Perhaps there are other files you are using?

— Reply to this email directly, view it on GitHub https://github.com/DASDAE/dascore/issues/221#issuecomment-1703353639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV57BNGVLXENJUHVGM2CGXDXYJKFHANCNFSM6AAAAAA3R2A424 . You are receiving this because you modified the open/close state.Message ID: @.***>