Open yarikoptic opened 5 years ago
@yarikoptic and @amanelis17 we can look to update the README.md on this particular issue.
Normally data is submitted twice a year (Dec-Jan and Jun-Jul). During these data submission cycles, a new meta-data csv file is created with the new subjects/sessions. These data are submitted to the same NDA collection. When users find data, they generally will find all data matching a particular query, or from a particular collection/study and package the data that way. So they will get multiple submissions in their package, which encompass all the currently available data.
Thank you @obenshaindw for the explanation!
... a new meta-data csv file is created with the new subjects/
In the light of BIDS2NDA and possibly other workflows, creating of such a new csv file would entail creating one with all the entries for the dataset, running some diff
to select only new entries (while retaining the header) and then providing it to nda-tools. I would say it is a common usecase for other work flows as well since I hope people do not organize their data they analyze locally catering to NDA "incremental" submission requirement. As any "automation by human actions" I would say it is boring and bug prone (forgotten or duplicate entries), and it would be great to automate it...
Suggestion: Not sure yet either it should be a part of the vtcmd
or some independent helper (e.g. nda-diff
, ref #7). E.g. in case of nda-diff
it could get two csv files (e.g. submission-201901.csv
and submission-201907.csv
) and produce a new submission-201907-incremental.csv
. If coded in a modular fashion, the same actual internal nda_diff
function could be triggered by adding --since FILE
option to vtcmd
to avoid actual creation of the -incremental.csv
file and just doing diff before generating the package. That would simplify incremental submission, and eliminate possible human caused bugs.
AFAIK (myself and @amanelis17) neuroimaging data requires incremental uploads. Looking forward to the next submission we wondered how we will need to proceed in the future when more subjects/sessions get collected. It would be great if README.md made a clear example on how to proceed in those cases. Ideally, if provided image03.csv file was analyzed (based on NDA server side data) for older (already uploaded) entries and only new entries were uploaded, it would be great.
Thank you in advance!