Closed sappelhoff closed 3 years ago
Some thoughts on Derived Data: I think of 4 types of derived datasets to handle:
Add information or derived data measures:
Add to the data annotations:
Add data:
Replace the data with data transforms:
These issues need individual consideration:
Should (1-2) just adding new files require the data itself to be duplicated? Robert said that DOIs are not a full solution, since private data do not receive DOIs. Here it should be possible to suggest standards for pointers to private data: by definition, retaining such data is the responsibility of the data owners, so if the pointer scheme fails, it is their responsibility, not BIDS'. When the data are made shareable, they should receive DOIs. With this policy, data need NOT be duplicated. However, the validator needs to accept data with such non-DOI pointers only as 'private' data, not as 'shareable' data. [Note: Failing to create such a standard could effectively keep the BIDS standard useless for storing 'living data' archives that accumulate a heterarchy of data annotations, derived measures, etc.; ].
For (4), the problem is to annotate how the included transformed data was derived from the original dataset data. This could be as simple as a required statement, or as difficult as requiring inclusion of the code used to create the transformed data. Again, the problem of reference to the original data (DOI, or ??). Without a provenance standard, the BIDS standards may not long remain useful.
Scott Makeig
@sappelhoff how do we move this to the bep21 repo (or close and link to it?)
@CPernet I transferred the issue to the bep021 repo
Dear all,
As many of you are perhaps aware, we have been working hard on standardizing data archival and storage of electrophysiology data through extensions to the Brain Imaging Data Structure (BIDS) for EEG, iEEG and MEG.
While the focus so far was on raw files, the next step is to standardize the storage of so-called derivative or processed data. In this regard, we would like to survey the electrophysiology community on its present needs and to address them in a data-driven manner. We invite researchers in the community to take 5 minutes of their time to answer this poll: https://forms.gle/k8qpW1ddyk4Hqh5q8. Your feedback and suggestions are very important to us.
See the draft for the derivatives extension here: https://docs.google.com/document/d/1PmcVs7vg7Th-cGC-UrX8rAhKUHIzOI-uIOh69_mvdlw
Best, BIDS derivative team for electrophysiology