AllenNeuralDynamics / aind-ephys-pipeline

Code Ocean pipeline for ephys processing with Kilosort2.5
MIT License
5 stars 2 forks source link

device names in NWB broken #21

Open bjhardcastle opened 1 month ago

bjhardcastle commented 1 month ago

@alejoe91

Up until this week, the units table included device names like this:

>>> s.nwb_zarr["units/device_name"][:]
array(['ProbeA', 'ProbeA', 'ProbeA', ..., 'ProbeF', 'ProbeF', 'ProbeF'],
      dtype=object)

Now they've changed to this:

>>> s.nwb_zarr["units/device_name"][:]
array(['Neuropixels 1.0', 'Neuropixels 1.0', 'Neuropixels 1.0', ...,
       'Neuropixels 1.0', 'Neuropixels 1.0', 'Neuropixels 1.0'],
      dtype=object)

...and the names in the devices table look like this:

>>> tuple(s.nwb_zarr["general/devices"].keys()) 
('Neuropixels 1.0', 'Neuropixels 1.0-1', 'Neuropixels 1.0-2', 'Neuropixels 1.0-3', 'Neuropixels 1.0-4', 'Neuropixels 1.0-5')

This broke some downstream processing for us (https://github.com/AllenInstitute/npc_sessions/issues/127) and makes it really difficult to link units to probes. Could you please either fix the names of the devices or, if this was necessary, consider adding an electrode_group_name column to the units table.

Could you also please help us re-run the nwb-generation step of the pipeline for affected sessions?

alejoe91 commented 1 month ago

Sure! It was definitely an unintended change! I'll fix it on Monday

alejoe91 commented 1 month ago

@bjhardcastle I checked the latest asset (this one) ran through the new pipeline and it correctly used the probe name:

>>> nwbfile.units["device_name"][:]
array(['ProbeA', 'ProbeA', 'ProbeA', ..., 'ProbeF', 'ProbeF', 'ProbeF'],
      dtype=object)

Could you point me to an asset with the issue described above? So I can check its provenance. Anyways, running the problematic assets through the current pipeline should produce results with the correct device names!

bjhardcastle commented 1 month ago

Weird, here's one from last week that has the issue: ecephys_644864_2023-02-01_00-00-00_sorted_2024-09-18_07-11-48

I also noticed this one from July, so maybe it's not due to a recent change: ecephys_668755_2023-08-28_13-06-40_sorted_2024-07-14_09-15-26

alejoe91 commented 1 month ago

Thanks. So this is not related to the pipeline, but rather to the way that the probe is loaded by probeinterface at the time of compression + upload. I can get these two raw assets fixed and re-trigger the new pipeline on them.

bjhardcastle commented 1 month ago

So the raw data assets have a different organization because of this?

I don't think it's limited to these two - they're just the first two I pulled up today... If I know what to look for in the raw data asset I can get a full list of affected sessions.

bjhardcastle commented 1 month ago

@alejoe91 could you please let us know how to identify raw data assets with this issue? Thanks

alejoe91 commented 1 month ago

yes, so you should traverse all sessions on s3, open a .zarr compressed folder with rec = si.read_zarr(...), and check the rec.get_annotation("probes_info"). This will be a list of 1 element with a dict name and probe_model. name should be ProbeA/B/....

If you give me a few days, I can take care of this. Not sure how urgent it is on your side though

bjhardcastle commented 1 month ago

It's always somewhat urgent, but a few days is ok, and I can help.

I checked all the uploaded ecephys data and put together a table of paths that need updating, with correct probe names - working is shown in the notebooks here: https://codeocean.allenneuraldynamics.org/capsule/4671738/tree

It's quite a lot that have the model name or None as their probe name (~1/3 of all probes). Seems like we should have noticed this affecting more than a couple of sessions... Do you sometimes fallback to parsing the path name instead?

alejoe91 commented 1 month ago

@bjhardcastle thanks for compiling the liet. This is due to probeinterface and was fixe by this release. In fact I don't see any issues of this type after November 10th 2023, which is probably when the new probeinterface release was deployed to the transfer service.

What we can do is:

Let me know if that sounds ok

bjhardcastle commented 1 month ago
alejoe91 commented 1 month ago

We have fixed probe info and other stuff in the zarr s3 a few times in the past :) Changes happen!