AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
333 stars 149 forks source link

Progress bar for Visual Coding Neuropixel csv files downloading #2631

Open vigji opened 1 year ago

vigji commented 1 year ago

Describe the use case that is addressed by this feature. I am downloading the Visual Coding Neuropixel data. I use the following code to download the .csv files and metadata:

from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache
manifest_path = os.path.join('/.../mypath', "manifest.json")
cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)

I got confused at first because this code seems to hang for quite a long time (10 mins) to download all the files. The time is fine, it was the same for the Visual Behavior Neuropixel data where the units csv is ~150 MB, so it can take some time. The problem is that this happens in the complete absence of any readout of the downloading process. It does not take too much to figure out, but a fellow lab mate with less experience almost drop the effort of working dataset thinking that there was something wrong with the library.

Describe the solution you'd like When doing the same operation using the Visual Behavior Neuropixel data, nice progress bars were displayed for every file (I guess with the tqdm library - already a project requirement). It would be great to implement them for this project cache as well.

Do you want to work on this issue? I would be happy to try and figure this out, would be also nice for me to get to know better the guts of the library. I would need some pointers though, as I saw that the two projects subclass completely different classes for caching (ProjectCacheBase and Cache). I do not understand why this is the case, and also why the two cache classes are so different given the similarity of what they are implementing, so I would need to understand a bit more the circumstances there.

Thank you for all the cool work on this cool dataset!