Open bendichter opened 2 years ago
the metadata extraction wasn't available in the earliest days of the archive. that's an old dandiset. was it run through the current cli ? can you re-upload the asset with the current cli with --existing replace
(i think), and see if it changes?
"replace" is not an option. The options are: error|skip|force|overwrite|refresh. Do you mean "force"?
i think overwrite
if it works the asset metadata should contain the voltageclampseries.
you can also test by using dandi ls locally on the nwb file
dandi ls seems to work:
(base) MacBook-Pro-3:sub-mouse-AAYYT bendichter$ dandi ls sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2_icephys.nwb
- age: P148D
cell_id: 20180420_sample_2
date_of_birth: 2017-11-23 00:00:00-05:00
experiment_description: Current clamp square
experimenter:
- Matteo Bernabucci
genotype: wt/wt
identifier: dcf5e2ab-2b96-492f-beb8-1f4dc6fabd52
institution: Baylor College of Medicine
keywords: null
lab: Tolias
nd_types:
- CreSubject
- CurrentClampSeries (79)
- CurrentClampStimulusSeries (79)
- DandiIcephysMetadata
- Device (161)
- IZeroClampSeries
- IntracellularElectrode (160)
- SweepTable
number_of_electrodes: 0
number_of_units: 0
nwb_version: 2.2.5
path: sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2_icephys.nwb
related_publications: null
session_description: Current clamp square
session_id: 20180420_sample_2
session_start_time: 2018-04-20 00:00:00-04:00
sex: F
size: 9337840
slice_id: 20180420_slice_2
species: null
subject_id: mouse_AAYYT
Here is the description of the options:
What to do if a file found existing on the server. 'skip' would skip the file, 'force' - force reupload, 'overwrite' - force upload if either size or modification time differs; 'refresh' - upload only if local modification time is ahead of the remote.
overwrite does not do anything because there is no difference in size or modification time. Should I try force?
Should I try force?
seems like that's the only option.
yes, that works! There are ~15 datasets that need this. Should I go through and download then re-upload them?
Should I go through and download then re-upload them?
that would be the simplest option.
since this process only updates the metadata and not the blobs, it is possible to do this more programmatically by using remote reading of the nwb file. but i don't think there is an "easy" option for extracting metadata from a remote file. @yarikoptic @jwodder any suggestions.
https://github.com/datalad/datalad-fuse if datalad version of dandiset is up to date (we might be a few days behind due to bottleneck...)
how would this help with the metadata issue? Are you proposing that we continue to download and then re-upload, but do it instead through datalad?
If you use datalad-fuse and mount that dandiset, you don't need to download entire content. dandi upload would extract metadata and only needed blocks would be downloaded... But if size isn't an issue, I guess indeed download/reupload is an option
I fixed the ones that I could but the issue is that I don't have write permission for most of the dandisets that have this problem. Also, several of them are quite recent. Is it possible that users are using an older version of the dandi-cli that is not properly extracting metadata?
Iirc we add metadata generatedBy or alike with version of dandi CLI , worth checking... Need to look at code - could it be pynwb version related?
I doubt it. Those types haven't changed since 2.0. Also reposting the exact same files fixes the metadata issue on DANDI
i don't think we have turned prior versions bad, so some of the changes in the latest cli with respect to metadata extraction (issue that addressed neural data types with parentheses) are not in place and hence it is possible people are using version where the metadata is not as complete.
regarding updating asset metadata, ideally this is something someone with admin privileges + datalad can do perhaps for all dandisets ;)
@bendichter is this issue still persist? I feel like we might want some generic helper (thus possibly in a client, one of the service-commands) to re-extract and re-upload metadata using remote access (via fsspec so works for any file) for specific dandisets/assets. I feel like we already did something like that in the glorious past but I bet it relied on data copy on drogon.
e.g. https://dandiarchive.org/dandiset/000109 contains neurodata type VoltageClampSeries. This is supposed to add VoltageClampSeries to the variableMeasured attribute of the assetsSummary (here), however this field is not populated. This problem shows up in many of the icephys datasets, and does not appear to be a problem for ecephys and ophys.
This is a roadblock for me to generate proper figures summarizing dandisets for the DANDI paper