Open magland opened 1 year ago
Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?
dandi organize
is just a helper to put contents into a fashion compatible with dandi validate
If running dandi organize
does not make use of the session ID as ses-{session_id}
(say, when run on a single file in isolation of other dandiset contents), then you can just manually add ses-{session_id}
to the filename; this is exactly how the automatic dandi upload helper function in NeuroConv works
I point this out because what I would do is just append -{name_of_sorter}
to the session ID of the file, which will then show up on the name of the file as well.
See https://github.com/dandi/dandi-cli/issues/1265 for a more detailed discuussion on the similar topic of separating raw from processed files, which we're currently experimenting with different approaches as examples in https://dandiarchive.org/dandiset/000568?pos=3 and https://dandiarchive.org/dandiset/000552?pos=4
Believe it or not but I am thrilled to hear all your arguments for storing raw and processed spike sorted data in different .nwb files -- that is how I kept suggesting it should be done so "great minds think alike" ;)
dandi organize
is just a helper - it is not mandatory to be used. As long as naming of the folders files follows either DANDI (output of dandi organize
) or BIDS convention -- we should be good!
_obj-
field) which would meaningfully distinguish raw from spike sorted files indeed. _units
) to annotate files. I left a comment/question in that BEP032 google doc. I see meeting coming up next Wed (right @SylvainTakerkart?) so may be we could briefly discuss. But meanwhile we could introduce both a suffix (_units
?) and use of _desc-
entities (so e.g., sub-mice1_ses-1_ephys.nwb
and sub-mice1_ses-1_desc-kilosort1_units.nwb
). And see if we could teach dandi organize
to even automagically populate them? Do you have some sample files (raw + 2 different spike sorting ones)?edit 1: fixed typos and added an example
Oh, that reminds me - the only caveat is that the session ID cannot contain underscores, since those are used as separator characters in the DANDI filename convention; I just replace them with dashes usually
@yarikoptic that makes sense.
I prepared a file called sub-paired-english/sub-paired-english_ses-paired-english-m108-191125-163508_desc-ms5-units_ecephys.nwb
and I tried to upload with the cli using
dandi upload
But I get an error because the name does not conform. Is there a different way I can upload?
But I get an error because the name does not conform. Is there a different way I can upload?
it would not conform until we allow for _desc
field. just disable validation for now. What API do you use for upload and what error do you get?
Thanks, I have disabled validation and then the command went through. I have the example data here!
https://dandiarchive.org/dandiset/000618/draft/files?location=sub-paired-english
You can view the raster plot in neurosift.
oh neurosift is nice! but can't see anything interesting for units seems to me -- please guide me:
may be errors in console are of relevance?
on 2nd try, when I clicked right away on "raster plot" it worked!
on 2nd try, when I clicked right away on "raster plot" it worked!
Great! You can also click on autocorrelograms.
support for _desc
should come in https://github.com/dandi/dandi-cli/pull/1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize
to automagically figure some label though. Ideas?
support for
_desc
should come in #1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible fordandi organize
to automagically figure some label though. Ideas?
Maybe there could be an optional dandi_desc attribute in the NWB file? But maybe it shouldn't have the word "dandi", not sure.
Hello DANDI team!
I have a situation where I'd like to upload the results of spike sorting as a separate NWB file from the one that contains the raw ephys traces. The reason I would like to do this is that I'd like to put the ephys data online first, and then perform spike sorting by streaming that data down into the sorting process. I don't want to then add the result to the original file, because then I'd need to re-upload the new file, which could be very large. Another reason for using a separate file is that I might want to do this more than once, for different sorting algorithms.
So I'm going to run into a naming problem, because the auto-assigned name is going to be the same (it's based on the session, etc). In that case, I realize that a checksum string will be added to the filename to distinguish it. But that's still not ideal because the name will not indicate which one has the spike sorting result. Ideally the name would have a helpful string in it such as "sorting" or "kilosort".
Wondering what you would recommend. Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?
Thanks in advance!