Open GoktugAlkan opened 1 year ago
@GoktugAlkan we've toyed with this idea using MPI but have never written a full-fledged conversion using parallelization. Instead, we have focused on parallelizing over sessions, which is easier and scales better. Parallelizing within a session could be helpful. MPI is better for distributing compute, and whether this could feasibly improve I/O will depend on the computer architecture. If you want to try this, take a look at this tutorial: https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/parallelio.html#sphx-glr-tutorials-advanced-io-parallelio-py
Thank you! A few days ago, I realized that for converting the data I should use matlab instead of python because it is an NS6-file for which Blackrock provides a Matlab toolbox.
Hence, I would like to know if you have a similar tutorial for matnwb. I saw the link for iterative data writing in matnwb (https://neurodatawithoutborders.github.io/matnwb/tutorials/html/dataPipe.html) but it does not deal with parallelizing this process.
Thanks in advance!
@GoktugAlkan If you are converting data from Blackrock ns6 format, I would recommend you try NeuroConv. See the tutorial for Blackrock here. This does not use MPI, but does iterate efficiently through large datasets. Let me know if this works for you.
I am not aware of any MATLAB functions that would allow you to use MPI with HDF5 in MATLAB.
Thanks! Unfortunately, our acquisition system splits the channels and the recording in an unexpected manner. Hence, we want to customize the conversion. However, it may be helpful to see the implementation of neuroconv such that I can adjust my code to increase the efficiency.
Hello @bendichter ,
Just a short question to clarify one point that you mentioned: you say that it is possible to parallelize the conversion over sessions, i.e., convert multiple sessions on multiple cores at the same time. Do I understand it correctly?
Thanks in advance!
@GoktugAlkan we are working on a system to parallelize conversions over sessions in the cloud. You could do this locally with something like Python multiprocessing, but I would imagine you would not get much performance improvement over serial processing, since the bottleneck will be I/O rather than computation.
but I would imagine you would not get much performance improvement over serial processing, since the bottleneck will be I/O rather than computation.
All depends on the system; if using the DANDI Hub I find the I/O bottleneck to be about ~8 sessions in parallel, regardless of # of CPUs
On personal devices I've seen speedups proportional when using 2-4 cores and best speeds when data and file is written on a SSD
It's nothing compared to the cloud deployment in development, but imagining these files exist only locally... 2 cores is still half the total conversion time; 4 cores is still a quarter of the total time, so it's not completely negligible
Thanks @CodyCBakerPhD and @bendichter for your answers.
I tried to run the conversion on multiple cores, i.e.:
1.) Iterate with a parfor-loop
in matlab
over the sessions.
2.) In each iteration, the data of the session is written to an nwb-file and exported to unique filepath.
When I do this, I get the error message that types.core.Device
cannot be found. When I replace parfor-loop
with a normal for-loop
, I don't get this error message.
Do you have an idea what may be the cause of this problem?
I haven't seen this error before. Could you open an issue on the MatNWB repo with code that reproduces the error and the full traceback?
@bendichter Thanks! Sure, I will do so
Hello,
Currently, I am trying to convert a large dataset into NWB, i.e., we have extracellular recordings of length 60 minutes from 256 electrodes with a sampling frequency of 30 k. Directly loading the data inside python in order to store it later as an nwb file is very time-consuming. Hence, I switched to the iterative data writing approach, i.e., subsets are read by a generator and written to the nwb file iteratively.
Although a better performance can be achieved by using this approach, I was wondering if parallel computing could also be used to enhance the performance, i.e., there is a for-loop in the generator, where in each iteration chunks of data are read. Is it possible to distribute these iterations over multiple cores (parallel computing)? Or does this not make sense? If it makes sense, is there documentation for this that could help in implementing it?
Many thanks in advance!