catalystneuro / tye-lab-to-nwb

NWB Conversion project for the Tye lab at the Salk Institute.
MIT License
0 stars 0 forks source link

[Bug] dateutil time parser in OpenEphys legacy #28

Closed CodyCBakerPhD closed 11 months ago

CodyCBakerPhD commented 11 months ago

Laurel found an issue on a new session

(tye_lab_to_nwb_env) lkeyes@node32:~/Projects/GIT/tye-lab-to-nwb$ python src/tye_lab_to_nwb/neurotensin_valence/neurotensin_valence_convert_all_sessions.py
  0%|                                                                                                                                                | 0/1 [00:00<?, ?it/s]Source data is valid!
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/lkeyes/anaconda3/envs/tye_lab_to_nwb_env/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/tye-lab-to-nwb/src/tye_lab_to_nwb/neurotensin_valence/neurotensin_valence_convert_session.py", line 148, in session_to_nwb
    metadata = converter.get_metadata()
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/tye-lab-to-nwb/src/tye_lab_to_nwb/neurotensin_valence/neurotensin_valencenwbconverter.py", line 35, in get_metadata
    metadata = super().get_metadata()
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/tye-lab-to-nwb/src/neuroconv/src/neuroconv/nwbconverter.py", line 85, in get_metadata
    interface_metadata = interface.get_metadata()
                         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/tye-lab-to-nwb/src/neuroconv/src/neuroconv/datainterfaces/ecephys/openephys/openephyslegacydatainterface.py", line 69, in get_metadata
    session_start_time = parse(date_created)
                         ^^^^^^^^^^^^^^^^^^^
  File "/home/lkeyes/anaconda3/envs/tye_lab_to_nwb_env/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 1368, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lkeyes/anaconda3/envs/tye_lab_to_nwb_env/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 643, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: '29-Apr-2020 16554'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/tye-lab-to-nwb/src/tye_lab_to_nwb/neurotensin_valence/neurotensin_valence_convert_all_sessions.py", line 92, in <module>
    parallel_convert_sessions(
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/tye-lab-to-nwb/src/tye_lab_to_nwb/neurotensin_valence/neurotensin_valence_convert_all_sessions.py", line 83, in parallel_convert_sessions
    future.result()
  File "/home/lkeyes/anaconda3/envs/tye_lab_to_nwb_env/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/lkeyes/anaconda3/envs/tye_lab_to_nwb_env/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
dateutil.parser._parser.ParserError: Unknown string format: '29-Apr-2020 16554'

@weiglszonja Looks like this is a discrepency on the NeuroConv ophenephys side however, so that's where the true problem/fix is. Can you raise a separate issue over there when you make the PR?

As far as the actual core issue, it simply looks like the extra '16554' bit causes dateutil.parser to fail. It's not clear exactly what form it expects however, so we might want to swap to datetime.strptime + an explicit expected form. Also guessing the core issue there is that the string 16554 could be interpreted as 16:55:04 or 16:05:54 but without the necessary zero padding it can't disambiguate. Not that strptime would help with that...

weiglszonja commented 11 months ago

@CodyCBakerPhD Thanks for creating an issue here, I'll open a separate issue on neuroconv soon. I switched to datetime.strptime and with the expected format "%d-%b-%Y %H%M%S" this example ('29-Apr-2020 16554') will be parsed as 2020-04-29 16:55:04. Can we trust that?

CodyCBakerPhD commented 11 months ago

Hmm... Because it scans left to right to grant precedence...

TBH the safest thing to do might be to simply capture it, warn the user, and fall back to using only the date?

timestamp_string = extracted_start_time.split(" ")[1]
if len(timestamp_string) != 6:
    warn(f"Timestamp for starting time from openephys metadata is ambiguous ('{timestamp_string }')! Only the date will be auto-populated in metadata. Please update the timestamp manually to record this value with the highest known temporal resolution.")
    extracted_datetime = datetime.strptime(extracted_datetime, "%d-%b-%Y")
else:
    extracted_datetime = datetime.strptime(extracted_datetime, "%d-%b-%Y %H%M%S")
weiglszonja commented 11 months ago

Yeah, agreed. I'll change https://github.com/catalystneuro/neuroconv/issues/576 accordingly.

weiglszonja commented 11 months ago

This should also fixed by #32 once neuroconv is updated as instructed in #29. Reopen if the issue persists.