dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 28 forks source link

Automatic File Naming Misses Some Sessions in 000971 #1492

Open pauladkisson opened 5 months ago

pauladkisson commented 5 months ago

Some of the file names in Dandiset 000971 do not contain the post-fix "behavior" despite containing a processing module named behavior just like all the other sessions.

For example, see sub-89-247_ses-FP-PR-2019-03-08T10-59-10.nwb

yarikoptic commented 3 months ago

We would need more information on how you got those file names -- using dandi organize? (then we should transfer the issue there etc)

bendichter commented 3 months ago

Yes, I believe this came about using dandi organize, and we should move this to the dandi cli repo.

DANDI CLI determines these names based on the presence of neurodata types, however this misses some cases where it is possible to determine the data types of the file contents in other ways. Here, there is a processing module named "behavior" that only contains an events data type, which is generic and can hold different types of data. The events is named "left nose poke times", so it clearly holds behavioral data, but neither the processing module nor the events data object are behavior-specific so the dandi cli does not label this file as having behavioral data.

The solution I think Paul is suggesting here is to have the DANDI CLI parse that the file contains behavioral data if it contains a processing module named "behavior." This type of processing module comes up a lot because it is one of our recommended names for processing modules: "behavior", "ecephys", "ophys", etc. This is indicated in our best practices document here: https://nwbinspector.readthedocs.io/en/dev/best_practices/nwbfile_metadata.html?highlight=processing#processing-module-names

The question is, do we want to use these types of heuristics to determine the file contents, or do we want to stick to neurodata types?