dandi / helpdesk

Repository to track help tickets from users.
3 stars 0 forks source link

Each data set should have a link to the accompanying study and associated code used for analyses. #74

Closed AngCamp closed 1 year ago

AngCamp commented 2 years ago

If you look at GEO (gene expression omnibus) one of the largest repositories of public data, you will see each dataset contains the abstract and a link to the related studies. Example: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125068

Contributors to DANDI should submit publications or make note of when they plan to publish, preprints are acceptable as well.

Meta data is insufficient to fully understand the data and may require context from the experiment. Further you should not assume a users experience. Many will be undergraduates and new graduate students becoming familiar with many systems. As DANDI grows tracking down where a dataset came from will become harder and harder. Also future users may want to perform semantic analyses, without a strucutred link to original publications DANDI is creating unnecessary barriers as it is currently set up for no reason and it is a simple fix to require submitters to provide a link to a publication, pre-print or make note that they have not published yet. Ideally a link to code ocean and or github repos would be appropriate as well.

Additonally

Who would use this feature?

Anyone and everyone. If you are searching DANDI itself you may not know what you're looking for when you go there. Why waste a users time forcing them to track down the experiment and the accompanying code used for previous analyses of the data. It provides context as to why and how a dataset was generated.

(Optional): Suggest a solution

Provide hyperlinks to the studies and repos. It should not be required to encourage early data sharing but it would be nice if contributors who don't supply link were bugged with email reminders so they don't just forget and move on.

AngCamp commented 2 years ago

Just for example: https://dandiarchive.org/dandiset/000206

Googling the title of this dataset does not turn anything up, I do not know if this data corresponds to a published work or if there is even an assocaited preprint. I can try to track the authors down but why should this be a users responsibility? Again please refer to the GEO link posted and consider the user experience.

satra commented 2 years ago

Thanks for the feedback @AngCamp . As you probably have seen the metadata does contain fields where a data submitter can put such information. The question is whether it should be enforced. We can probably indicate a marker of metadata quality. This issue was created to develop a checklist to remind users prior to publish: https://github.com/dandi/dandi-archive/issues/1090

AngCamp commented 1 year ago

IMO it should be mildly enforced. If users want to upload data before they have even submitted a preprint that's fine and obviously should not be discouraged, but I think there should be a required field in the metadata section that at least forces them to answer whether or not it is attached to a publication or preprint.