question about naming when a spike sorting is included as a separate NWB file

magland commented 1 year ago

Hello DANDI team!

I have a situation where I'd like to upload the results of spike sorting as a separate NWB file from the one that contains the raw ephys traces. The reason I would like to do this is that I'd like to put the ephys data online first, and then perform spike sorting by streaming that data down into the sorting process. I don't want to then add the result to the original file, because then I'd need to re-upload the new file, which could be very large. Another reason for using a separate file is that I might want to do this more than once, for different sorting algorithms.

So I'm going to run into a naming problem, because the auto-assigned name is going to be the same (it's based on the session, etc). In that case, I realize that a checksum string will be added to the filename to distinguish it. But that's still not ideal because the name will not indicate which one has the spike sorting result. Ideally the name would have a helpful string in it such as "sorting" or "kilosort".

Wondering what you would recommend. Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?

Thanks in advance!

CodyCBakerPhD commented 1 year ago

Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?

dandi organize is just a helper to put contents into a fashion compatible with dandi validate

If running dandi organize does not make use of the session ID as ses-{session_id} (say, when run on a single file in isolation of other dandiset contents), then you can just manually add ses-{session_id} to the filename; this is exactly how the automatic dandi upload helper function in NeuroConv works

I point this out because what I would do is just append -{name_of_sorter} to the session ID of the file, which will then show up on the name of the file as well.

See https://github.com/dandi/dandi-cli/issues/1265 for a more detailed discuussion on the similar topic of separating raw from processed files, which we're currently experimenting with different approaches as examples in https://dandiarchive.org/dandiset/000568?pos=3 and https://dandiarchive.org/dandiset/000552?pos=4

yarikoptic commented 1 year ago

Believe it or not but I am thrilled to hear all your arguments for storing raw and processed spike sorted data in different .nwb files -- that is how I kept suggesting it should be done so "great minds think alike" ;)

dandi organize is just a helper - it is not mandatory to be used. As long as naming of the folders files follows either DANDI (output of dandi organize) or BIDS convention -- we should be good!
- DANDI convention: we use only a set of fields based on the metadata we extract from nwb files - https://github.com/dandi/dandi-cli/blob/HEAD/dandi/consts.py#L189 . ATM there is no "semantic" (there is _obj- field) which would meaningfully distinguish raw from spike sorted files indeed.
- At BIDS level, work is only ongoing to formalize for animal ephys data within https://bids.neuroimaging.io/bep032 . AFAIK it did not yet go to "spike sorted" data. Since it is a common case, I would expect some dedicated entity or even suffix (e.g. _units) to annotate files. I left a comment/question in that BEP032 google doc. I see meeting coming up next Wed (right @SylvainTakerkart?) so may be we could briefly discuss. But meanwhile we could introduce both a suffix (_units?) and use of _desc- entities (so e.g., sub-mice1_ses-1_ephys.nwb and sub-mice1_ses-1_desc-kilosort1_units.nwb). And see if we could teach dandi organize to even automagically populate them? Do you have some sample files (raw + 2 different spike sorting ones)?

edit 1: fixed typos and added an example

CodyCBakerPhD commented 1 year ago

Oh, that reminds me - the only caveat is that the session ID cannot contain underscores, since those are used as separator characters in the DANDI filename convention; I just replace them with dashes usually

magland commented 1 year ago

@yarikoptic that makes sense.

I prepared a file called sub-paired-english/sub-paired-english_ses-paired-english-m108-191125-163508_desc-ms5-units_ecephys.nwb

and I tried to upload with the cli using

dandi upload

But I get an error because the name does not conform. Is there a different way I can upload?

yarikoptic commented 1 year ago

But I get an error because the name does not conform. Is there a different way I can upload?

it would not conform until we allow for _desc field. just disable validation for now. What API do you use for upload and what error do you get?

magland commented 1 year ago

Thanks, I have disabled validation and then the command went through. I have the example data here!

https://dandiarchive.org/dandiset/000618/draft/files?location=sub-paired-english

You can view the raster plot in neurosift.

yarikoptic commented 1 year ago

oh neurosift is nice! but can't see anything interesting for units seems to me -- please guide me:

may be errors in console are of relevance?

yarikoptic commented 1 year ago

on 2nd try, when I clicked right away on "raster plot" it worked!

magland commented 1 year ago

on 2nd try, when I clicked right away on "raster plot" it worked!

Great! You can also click on autocorrelograms.

yarikoptic commented 1 year ago

support for _desc should come in https://github.com/dandi/dandi-cli/pull/1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize to automagically figure some label though. Ideas?

magland commented 1 year ago

support for _desc should come in #1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize to automagically figure some label though. Ideas?

Maybe there could be an optional dandi_desc attribute in the NWB file? But maybe it shouldn't have the word "dandi", not sure.

dandi / dandi-cli

question about naming when a spike sorting is included as a separate NWB file #1314