OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
98 stars 73 forks source link

Filename parsing error when there is postfix beyond HHMMSS #182

Closed oftfrfbf closed 4 years ago

oftfrfbf commented 4 years ago

Hello, I am having trouble converting an EK80 file to a netcdf. The code used is below using a clone of the latest version of echopype.

>>> import echopype
>>> filename = "SD2019_WCS_v05-Phase0-D20190802-T033505-0.raw"
>>> data = echopype.convert.ek80.ConvertEK80(filename)
>>> data
<echopype.convert.ek80.ConvertEK80 object at 0x0000015CFFD55A88>
>>> netcdf = data.raw2nc()
11:25:55  converting file: SD2019_WCS_v05-Phase0-D20190802-T033505-0.raw
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\site-packages\echopype\convert\convertbase.py", line 200, in raw2nc
    self.save(".nc", save_path, combine_opt, overwrite, compress)
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\site-packages\echopype\convert\ek80.py", line 641, in save
    self._export_nc(save_settings, file_idx)
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\site-packages\echopype\convert\ek80.py", line 531, in _export_nc
    self._set_groups(raw_file, out_file, save_settings)
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\site-packages\echopype\convert\ek80.py", line 476, in _set_groups
    grp.set_toplevel(self._set_toplevel_dict(raw_file))  # top-level group
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\site-packages\echopype\convert\ek80.py", line 223, in _set_toplevel_dict
    out_dict['date_created'] = dt.strptime(filedate + '-' + filetime, '%Y%m%d-%H%M%S').isoformat() + 'Z'
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\_strptime.py", line 577, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "C:\Users\r\AppData\Local\Programs\Python\Python37\lib\_strptime.py", line 359, in _strptime
    (data_string, format))
ValueError: time data 'T033505-0' does not match format '%Y%m%d-%H%M%S'

It looks like the appended "T033505-0" needs to match a proper time formatting convention.

leewujung commented 4 years ago

@oftfrfbf : at some point I was gonna swap out the current hard-coded filename parsing to using regular expression.

The pattern below works with the standard SURVEYNAME-DXXXXXX-TXXXXXX.raw filename

FILENAME_MATCHER = re.compile('(?P<survey>\w+)?-?D(?P<date>\w+)-T(?P<time>\w+).raw')

Could you take this a bit further to include the possible postfix beyond the time, as we see in the Saildrone data? Thanks!

Additional note: in the top-level group in the netCDF/zarr file, there's a data variable date_created (basically where you see this errored out). Another possibility is to simply take the ping time from the first ping to write this field. Let me know what you think.

leewujung commented 4 years ago

@Chuck-A and I took care of this. See PR #187 and #188 .