LorenFrankLab / trodes_to_nwb

Converts data from SpikeGadgets to the NWB Data Format
MIT License
2 stars 2 forks source link

Worker memory overuse when converting to nwb #92

Closed rpswenson closed 2 weeks ago

rpswenson commented 2 weeks ago

When attempting to run:

from trodes_to_nwb.convert import create_nwbs
path = data_path # from step 1
output_dir = "/stelmo/temp/"

# for dual site recording:
probe_metadata_paths=['/cumulus/rio/BS28_raw_old/tetrode_12.5.yml',
                          '/cumulus/rio/BS28_raw_old/128c-4s8mm6cm-20um-40um-sl.yml'],

create_nwbs(
    path,
    output_dir=output_dir,
    header_reconfig_path='/cumulus/rio/BS28_raw_old/20231107/20231107_BS28.trodesconf',
    #probe_metadata_paths=['/home/mcoulter/tetrode_12.5.yml'],
    #probe_metadata_paths=None,
    probe_metadata_paths=probe_metadata_paths,
    convert_video=False,
    n_workers=8,
    query_expression=None,
)

I get this error:

Cell In[2], line 11
      7 # for dual site recording:
      8 probe_metadata_paths=['[/cumulus/rio/BS28_raw_old/tetrode_12.5.yml](http://localhost:6502/cumulus/rio/BS28_raw_old/tetrode_12.5.yml)',
      9                           '[/cumulus/rio/BS28_raw_old/128c-4s8mm6cm-20um-40um-sl.yml](http://localhost:6502/cumulus/rio/BS28_raw_old/128c-4s8mm6cm-20um-40um-sl.yml)'],
---> 11 create_nwbs(
     12     path,
     13     output_dir=output_dir,
     14     header_reconfig_path='[/cumulus/rio/BS28_raw_old/20231107/20231107_BS28.trodesconf](http://localhost:6502/cumulus/rio/BS28_raw_old/20231107/20231107_BS28.trodesconf)',
     15     #probe_metadata_paths=['[/home/mcoulter/tetrode_12.5.yml](http://localhost:6502/home/mcoulter/tetrode_12.5.yml)'],
     16     #probe_metadata_paths=None,
     17     probe_metadata_paths=probe_metadata_paths,
     18     convert_video=False,
     19     n_workers=8,
     20     query_expression=None,
     21 )

File [~/trodes_to_nwb/src/trodes_to_nwb/convert.py:179](http://localhost:6502/trodes_to_nwb/src/trodes_to_nwb/convert.py#line=178), in create_nwbs(path, header_reconfig_path, probe_metadata_paths, output_dir, video_directory, convert_video, n_workers, query_expression, disable_ptp)
    177 # print out error results
    178 for args, future in zip(argument_list, futures):
--> 179     result = future.result()
    180     if result is not True:
    181         print(args, result)

File [~/miniforge3/envs/trodes_to_nwb/lib/python3.12/site-packages/distributed/client.py:328](http://localhost:6502/miniforge3/envs/trodes_to_nwb/lib/python3.12/site-packages/distributed/client.py#line=327), in Future.result(self, timeout)
    326 self._verify_initialized()
    327 with shorten_traceback():
--> 328     return self.client.sync(self._result, callback_timeout=timeout)

File [~/miniforge3/envs/trodes_to_nwb/lib/python3.12/site-packages/distributed/client.py:336](http://localhost:6502/miniforge3/envs/trodes_to_nwb/lib/python3.12/site-packages/distributed/client.py#line=335), in Future._result(self, raiseit)
    334 if raiseit:
    335     typ, exc, tb = exc
--> 336     raise exc.with_traceback(tb)
    337 else:
    338     return exc

KilledWorker: Attempted to run task 'pass_func-4e6bdfebabe77fefd27ed79e0ab49e55' on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:40071. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.

even when running the code on virga which should have more free memory. What can I do to alleviate this issue?

samuelbray32 commented 2 weeks ago

If you're converting large sessions (e.g. continuous overnight recording, lots of probes) you probably can't convert more than one session at a time on any of the computers.

I'd suggest just setting n_workers=1 for most cases. That said, even with normal sized sessions, 8 is very ambitious. It still requires a decent amount of RAM to do the conversion.

rpswenson commented 2 weeks ago

thanks, switching to n_workers=1 worked :)