NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 22 forks source link

Question: clarify incremental submissions procedure for neuroimaging data #16

Open yarikoptic opened 5 years ago

yarikoptic commented 5 years ago

AFAIK (myself and @amanelis17) neuroimaging data requires incremental uploads. Looking forward to the next submission we wondered how we will need to proceed in the future when more subjects/sessions get collected. It would be great if README.md made a clear example on how to proceed in those cases. Ideally, if provided image03.csv file was analyzed (based on NDA server side data) for older (already uploaded) entries and only new entries were uploaded, it would be great.

Thank you in advance!

obenshaindw commented 5 years ago

@yarikoptic and @amanelis17 we can look to update the README.md on this particular issue.

Normally data is submitted twice a year (Dec-Jan and Jun-Jul). During these data submission cycles, a new meta-data csv file is created with the new subjects/sessions. These data are submitted to the same NDA collection. When users find data, they generally will find all data matching a particular query, or from a particular collection/study and package the data that way. So they will get multiple submissions in their package, which encompass all the currently available data.

yarikoptic commented 5 years ago

Thank you @obenshaindw for the explanation!

... a new meta-data csv file is created with the new subjects/

In the light of BIDS2NDA and possibly other workflows, creating of such a new csv file would entail creating one with all the entries for the dataset, running some diff to select only new entries (while retaining the header) and then providing it to nda-tools. I would say it is a common usecase for other work flows as well since I hope people do not organize their data they analyze locally catering to NDA "incremental" submission requirement. As any "automation by human actions" I would say it is boring and bug prone (forgotten or duplicate entries), and it would be great to automate it... Suggestion: Not sure yet either it should be a part of the vtcmd or some independent helper (e.g. nda-diff, ref #7). E.g. in case of nda-diff it could get two csv files (e.g. submission-201901.csv and submission-201907.csv) and produce a new submission-201907-incremental.csv. If coded in a modular fashion, the same actual internal nda_diff function could be triggered by adding --since FILE option to vtcmd to avoid actual creation of the -incremental.csv file and just doing diff before generating the package. That would simplify incremental submission, and eliminate possible human caused bugs.