dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
21 stars 25 forks source link

variableMeasured not being populated for icephys #1016

Open bendichter opened 2 years ago

bendichter commented 2 years ago

e.g. https://dandiarchive.org/dandiset/000109 contains neurodata type VoltageClampSeries. This is supposed to add VoltageClampSeries to the variableMeasured attribute of the assetsSummary (here), however this field is not populated. This problem shows up in many of the icephys datasets, and does not appear to be a problem for ecephys and ophys.

This is a roadblock for me to generate proper figures summarizing dandisets for the DANDI paper

satra commented 2 years ago

the metadata extraction wasn't available in the earliest days of the archive. that's an old dandiset. was it run through the current cli ? can you re-upload the asset with the current cli with --existing replace (i think), and see if it changes?

bendichter commented 2 years ago

"replace" is not an option. The options are: error|skip|force|overwrite|refresh. Do you mean "force"?

satra commented 2 years ago

i think overwrite

satra commented 2 years ago

if it works the asset metadata should contain the voltageclampseries.

satra commented 2 years ago

you can also test by using dandi ls locally on the nwb file

bendichter commented 2 years ago

dandi ls seems to work:

(base) MacBook-Pro-3:sub-mouse-AAYYT bendichter$ dandi ls sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2_icephys.nwb
- age: P148D
  cell_id: 20180420_sample_2
  date_of_birth: 2017-11-23 00:00:00-05:00
  experiment_description: Current clamp square
  experimenter:
  - Matteo Bernabucci
  genotype: wt/wt
  identifier: dcf5e2ab-2b96-492f-beb8-1f4dc6fabd52
  institution: Baylor College of Medicine
  keywords: null
  lab: Tolias
  nd_types:
  - CreSubject
  - CurrentClampSeries (79)
  - CurrentClampStimulusSeries (79)
  - DandiIcephysMetadata
  - Device (161)
  - IZeroClampSeries
  - IntracellularElectrode (160)
  - SweepTable
  number_of_electrodes: 0
  number_of_units: 0
  nwb_version: 2.2.5
  path: sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2_icephys.nwb
  related_publications: null
  session_description: Current clamp square
  session_id: 20180420_sample_2
  session_start_time: 2018-04-20 00:00:00-04:00
  sex: F
  size: 9337840
  slice_id: 20180420_slice_2
  species: null
  subject_id: mouse_AAYYT

Here is the description of the options:

What to do if a file found existing on the server. 'skip' would skip the file, 'force' - force reupload, 'overwrite' - force upload if either size or modification time differs; 'refresh' - upload only if local modification time is ahead of the remote.

overwrite does not do anything because there is no difference in size or modification time. Should I try force?

satra commented 2 years ago

Should I try force?

seems like that's the only option.

bendichter commented 2 years ago

yes, that works! There are ~15 datasets that need this. Should I go through and download then re-upload them?

satra commented 2 years ago

Should I go through and download then re-upload them?

that would be the simplest option.

since this process only updates the metadata and not the blobs, it is possible to do this more programmatically by using remote reading of the nwb file. but i don't think there is an "easy" option for extracting metadata from a remote file. @yarikoptic @jwodder any suggestions.

yarikoptic commented 2 years ago

https://github.com/datalad/datalad-fuse if datalad version of dandiset is up to date (we might be a few days behind due to bottleneck...)

bendichter commented 2 years ago

how would this help with the metadata issue? Are you proposing that we continue to download and then re-upload, but do it instead through datalad?

yarikoptic commented 2 years ago

If you use datalad-fuse and mount that dandiset, you don't need to download entire content. dandi upload would extract metadata and only needed blocks would be downloaded... But if size isn't an issue, I guess indeed download/reupload is an option

bendichter commented 2 years ago

I fixed the ones that I could but the issue is that I don't have write permission for most of the dandisets that have this problem. Also, several of them are quite recent. Is it possible that users are using an older version of the dandi-cli that is not properly extracting metadata?

yarikoptic commented 2 years ago

Iirc we add metadata generatedBy or alike with version of dandi CLI , worth checking... Need to look at code - could it be pynwb version related?

bendichter commented 2 years ago

I doubt it. Those types haven't changed since 2.0. Also reposting the exact same files fixes the metadata issue on DANDI

satra commented 2 years ago

i don't think we have turned prior versions bad, so some of the changes in the latest cli with respect to metadata extraction (issue that addressed neural data types with parentheses) are not in place and hence it is possible people are using version where the metadata is not as complete.

regarding updating asset metadata, ideally this is something someone with admin privileges + datalad can do perhaps for all dandisets ;)

yarikoptic commented 2 years ago

@bendichter is this issue still persist? I feel like we might want some generic helper (thus possibly in a client, one of the service-commands) to re-extract and re-upload metadata using remote access (via fsspec so works for any file) for specific dandisets/assets. I feel like we already did something like that in the glorious past but I bet it relied on data copy on drogon.