MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
47 stars 45 forks source link

Trouble reading data subset #277

Closed hevgyrt closed 5 months ago

hevgyrt commented 7 months ago

Describe the bug:

NOTE: This issue is copied from the dolfyn repo I have the two following issues with regards to using the nens argument option in the dolfyn.read function.

To Reproduce:

  1. If I provide start values larger than 0, like dat = dolfyn.read('myfile.ad2cp',nens=[3,150]), I get
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[70], line 1
----> 1 dat = dolfyn.read('myfile.ad2cp',nens=[3,150])

File ~/miniconda3/envs/ekok/li/python3.9/site-packages/dolfyn/io/api.py:103, in read(fname, userdata, nens, **kwargs)
     99     func_map = dict(RDI=read_rdi,
    100                     nortek=read_nortek,
    101                     signature=read_signature)
    102     func = func_map[file_type]
--> 103 return func(fname, userdata=userdata, nens=nens, **kwargs)

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2.py:69, in read_signature(filename, userdata, nens, rebuild_index, debug, **kwargs)
     67 d = rdr.readfile(nens[0], nens[1])
     68 rdr.sci_data(d)
---> 69 out = _reorg(d)
     70 _reduce(out)
     72 # Convert time to dt64 and fill gaps

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2.py:413, in _reorg(dat)
    410 outdat['long_name'].update(dnow['long_name'])
    411 outdat['standard_name'].update(dnow['standard_name'])
    412 cfg['burst_config' + tag] = lib._headconfig_int2dict(
--> 413     lib._collapse(dnow['config'], exclude=collapse_exclude,
    414                   name='config'))
    415 outdat['coords']['time' + tag] = lib._calc_time(
    416     dnow['year'] + 1900,
    417     dnow['month'],
   (...)
    421     dnow['second'],
    422     dnow['usec100'].astype('uint32') * 100)
    423 tmp = lib._beams_cy_int2dict(
    424     lib._collapse(dnow['beam_config'], exclude=collapse_exclude,
    425                   name='beam_config'), 21)

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2_lib.py:429, in _collapse(vec, name, exclude)
    425 def _collapse(vec, name=None, exclude=[]):
    426     """Check that the input vector is uniform, then collapse it to a
    427     single value, otherwise raise a warning.
    428     """
--> 429     if _isuniform(vec):
    430         return vec[0]
    431     elif _isuniform(vec, exclude=exclude):

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2_lib.py:422, in _isuniform(vec, exclude)
    420 if len(exclude):
    421     return len(set(np.unique(vec)) - set(exclude)) <= 1
--> 422 return np.all(vec == vec[0])

IndexError: index 0 is out of bounds for axis 0 with size 0
  1. There seem to be an upper bound on the stop value, which is much lower (here 15000) than the number of points that I have in my dataset (at least at the order of 100k).
    
    IndexError                                Traceback (most recent call last)
    Cell In[72], line 1
    ----> 1 dat = dolfyn.read('myfile.ad2cp',nens=[0,15000])

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/api.py:103, in read(fname, userdata, nens, kwargs) 99 func_map = dict(RDI=read_rdi, 100 nortek=read_nortek, 101 signature=read_signature) 102 func = func_map[file_type] --> 103 return func(fname, userdata=userdata, nens=nens, kwargs)

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2.py:67, in read_signature(filename, userdata, nens, rebuild_index, debug, **kwargs) 64 userdata = _find_userdata(filename, userdata) 66 rdr = _Ad2cpReader(filename, rebuild_index=rebuild_index, debug=debug) ---> 67 d = rdr.readfile(nens[0], nens[1]) 68 rdr.sci_data(d) 69 out = _reorg(d)

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2.py:311, in _Ad2cpReader.readfile(self, ens_start, ens_stop) 307 if sz != rdr._N[tmp_idx]: 308 raise Exception( 309 "The number of samples in this 'Altimeter Raw' " 310 "burst is different from prior bursts.") --> 311 self._read_burst(id, outdat[id], c26) 312 outdat[id]['ensemble'][c26] = c 313 c26 += 1

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2.py:251, in _Ad2cpReader._read_burst(self, id, dat, c, echo) 249 def _read_burst(self, id, dat, c, echo=False): 250 rdr = self._burst_readers[id] --> 251 rdr.read_into(self.f, dat, c)

File ~/miniconda3/envs/ekok/lib/python3.9/site-packages/dolfyn/io/nortek2_defs.py:79, in _DataDef.read_into(self, fobj, data, ens, cs) 77 for nm, shp, d in zip(self._names, self._shape, dat_tuple): 78 try: ---> 79 data[nm][..., ens] = d 80 except ValueError: 81 data[nm][..., ens] = np.asarray(d).reshape(shp)

IndexError: index 11 is out of bounds for axis 0 with size 11



### Expected behavior:
I expected it to work.

### Screenshots:
Output text is given above

### Desktop (please complete the following information):
 - OS: Ubuntu Jammy

### Additional context:
The acquisition was concurrent Burst and waves + the echosounder option enabled using a Signature 500
ssolson commented 7 months ago

Hey @hevgyrt thank you for your interest in MHKiT and bringing the issue to our attention. It looks like @jmcvey3 has assigned himself to address this and hopefully can get a resolution for you to try soon. Thanks James for taking the lead.

jmcvey3 commented 7 months ago

Hi @hevgyrt, see if you can pull PR #280 and let me know how that works. It should solve problem 1 listed above, but I wasn't able to replicate problem 2.

Both of these bugs result from how Nortek stores "raw altimeter" data (labeled "ID 26" in their integration docs), which appear to be initial measurements used to figure how far away the water surface is. In the sample file I have, this ID is only stored in ensemble 0; therefore if I start at nens>0, I get an error. For problem 2, it seems that "ID 26" is saved multiple times (at least 11 times in the first 15000 ensembles) in your datafile. The Nortek Signature reader was set up to handle your situation, so I'm not sure what it's getting caught up on.

jmcvey3 commented 6 months ago

Affirmed these changes have fixed the bug (https://github.com/lkilcher/dolfyn/issues/122)