dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

question about naming when a spike sorting is included as a separate NWB file #1314

Open magland opened 1 year ago

magland commented 1 year ago

Hello DANDI team!

I have a situation where I'd like to upload the results of spike sorting as a separate NWB file from the one that contains the raw ephys traces. The reason I would like to do this is that I'd like to put the ephys data online first, and then perform spike sorting by streaming that data down into the sorting process. I don't want to then add the result to the original file, because then I'd need to re-upload the new file, which could be very large. Another reason for using a separate file is that I might want to do this more than once, for different sorting algorithms.

So I'm going to run into a naming problem, because the auto-assigned name is going to be the same (it's based on the session, etc). In that case, I realize that a checksum string will be added to the filename to distinguish it. But that's still not ideal because the name will not indicate which one has the spike sorting result. Ideally the name would have a helpful string in it such as "sorting" or "kilosort".

Wondering what you would recommend. Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?

Thanks in advance!

CodyCBakerPhD commented 1 year ago

Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?

dandi organize is just a helper to put contents into a fashion compatible with dandi validate

If running dandi organize does not make use of the session ID as ses-{session_id} (say, when run on a single file in isolation of other dandiset contents), then you can just manually add ses-{session_id} to the filename; this is exactly how the automatic dandi upload helper function in NeuroConv works

I point this out because what I would do is just append -{name_of_sorter} to the session ID of the file, which will then show up on the name of the file as well.

See https://github.com/dandi/dandi-cli/issues/1265 for a more detailed discuussion on the similar topic of separating raw from processed files, which we're currently experimenting with different approaches as examples in https://dandiarchive.org/dandiset/000568?pos=3 and https://dandiarchive.org/dandiset/000552?pos=4

yarikoptic commented 1 year ago

Believe it or not but I am thrilled to hear all your arguments for storing raw and processed spike sorted data in different .nwb files -- that is how I kept suggesting it should be done so "great minds think alike" ;)

edit 1: fixed typos and added an example

CodyCBakerPhD commented 1 year ago

Oh, that reminds me - the only caveat is that the session ID cannot contain underscores, since those are used as separator characters in the DANDI filename convention; I just replace them with dashes usually

magland commented 1 year ago

@yarikoptic that makes sense.

I prepared a file called sub-paired-english/sub-paired-english_ses-paired-english-m108-191125-163508_desc-ms5-units_ecephys.nwb

and I tried to upload with the cli using

dandi upload

But I get an error because the name does not conform. Is there a different way I can upload?

yarikoptic commented 1 year ago

But I get an error because the name does not conform. Is there a different way I can upload?

it would not conform until we allow for _desc field. just disable validation for now. What API do you use for upload and what error do you get?

magland commented 1 year ago

Thanks, I have disabled validation and then the command went through. I have the example data here!

https://dandiarchive.org/dandiset/000618/draft/files?location=sub-paired-english

You can view the raster plot in neurosift.

yarikoptic commented 1 year ago

oh neurosift is nice! but can't see anything interesting for units seems to me -- please guide me:

image

may be errors in console are of relevance?

image

yarikoptic commented 1 year ago

on 2nd try, when I clicked right away on "raster plot" it worked!

magland commented 1 year ago

on 2nd try, when I clicked right away on "raster plot" it worked!

Great! You can also click on autocorrelograms.

yarikoptic commented 1 year ago

support for _desc should come in https://github.com/dandi/dandi-cli/pull/1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize to automagically figure some label though. Ideas?

magland commented 1 year ago

support for _desc should come in #1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize to automagically figure some label though. Ideas?

Maybe there could be an optional dandi_desc attribute in the NWB file? But maybe it shouldn't have the word "dandi", not sure.