DataSeer / dataseer-web

DataSeer web application
GNU General Public License v3.0
13 stars 1 forks source link

Display "Data Re-Use" wiki advice when re-use detected #268

Closed timvines19 closed 3 years ago

timvines19 commented 3 years ago

If either the ML or the user checks the 'data re-use' box [NB it should be renamed from just 're-use'], the advice for 'Dataset re-use' from the wiki should be displayed (https://wiki.dataseer.ai/doku.php?id=data_type:dataset_reuse), regardless of the data type described by the sentence.

Screen Shot 2021-04-16 at 2 36 26 PM

NicolasKieffer commented 3 years ago

Should I just display the "Description" and "Best data format for sharing" parts of "Dataset Re-use"? Or "Not applicable" should be displayed too?

Because there is no "Most suitable repositories" available for "Dataset Re-use".

Like below ("Most suitable repositories" did not change) image

image

timvines19 commented 3 years ago

I think we should have 'not applicable' appear under the 'Most suitable repositories'.

Users should still be able to change the data type, but we somehow need to make it clear that we stick with the same data re-use advice regardless of the data type [@samanthablankers maybe we should create a second 'data re-use' set of advice for all the data types? That would make it clearer to users what's happening]

samanthablankers commented 3 years ago

Okay, so would I keep the data type description the same between a given data type and that data-type re-use page? And then best practice for sharing and suitable repositories I would just write what is seen in the data set re-use wiki page?

timvines19 commented 3 years ago

@samanthablankers that's a good idea, e.g.

Flow Cytometry MeSH ID: D005434

Description: Technique using an instrument system for making, processing, and displaying one or more measurements on individual cells obtained from a cell suspension. Cells are usually stained with one or more fluorescent dyes specific to cell components of interest, e.g., DNA, and fluorescence of each cell is measured as it rapidly transverses the excitation beam (laser or mercury arc lamp). Fluorescence provides a quantitative measure of various biochemical and biophysical properties of the cell, as well as a basis for cell sorting. Other measurable optical parameters include light absorption and light scattering, the latter being applicable to the measurement of cell size, shape, density, granularity, and stain uptake.

Best practice for indicating re-use of existing data

For public datasets please provide a DOI or other stable identified for the dataset itself and include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed.

For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative): http://adni.loni.usc.edu/data-samples/access-data/).

Most suitable repositories: Not applicable

timvines19 commented 3 years ago

@NicolasKieffer will having the title as "Best practice for indicating re-use of existing data" instead of "Best practice for sharing this type of data" cause a problem? It's a bit confusing to have the second one.

I've updated the wiki entry for Data Re-use a bit: https://wiki.dataseer.ai/doku.php?id=data_type:dataset_reuse

NicolasKieffer commented 3 years ago

No problem, I will update DataSeer to handle it.

samanthablankers commented 3 years ago

I've updated the wiki with a re-use page for each data type, let me know if I need to make any changes @timvines19

timvines19 commented 3 years ago

Hi @samanthablankers we discussed this today on the call. Having separate page for each data type for re-use and original collection is going to make the drop down menu in the UI very long. @NicolasKieffer suggested that each data type page be like this instead:

Chromatography MeSH ID: D002845

Description: Techniques used to separate mixtures of substances based on differences in the relative affinities of the substances for mobile and stationary phases. A mobile phase (fluid or gas) passes through a column containing a stationary phase of porous solid or liquid coated on a solid support. Usage is both analytical for small amounts and preparative for bulk amounts.

Best practice for sharing this type of data: Chromatograph data for each run should be preserved in ANDI-MS format. Numerical output from downstream analysis of the chromatograph should be shared as Tabular data: Tabular data should be saved as a .txt or .csv file. The first row(s) should contain information about the dataset, such as the data file name, author, today's date, when the data within the file were last modified, and companion file names. Please also state which symbol has been used to denote missing data (NA is preferred). Column headings should describe the content of each column and contain only numbers, letters, and underscores - no spaces or special characters. Lowercase letters are preferred. Row names should be consistent with those used in the article and in other related datasets.

Most suitable repositories: Chromatography data can be added to GlycoProtDB, Golm Metabolome Database, PRIDE, and Proteome-pI.

Best practice for indicating re-use of existing data: For public datasets please provide a DOI or other stable identified for the dataset itself and include a citation for the dataset in the reference list. Be sure to indicate exactly which data has been re-used, particularly when multiple versions of the dataset exist. In many cases, this is best achieved by sharing the code used to extract the part of the data that you analyzed. In some cases it may be best to share the exact dataset(s) you analyzed as well.

For access-controlled data authors should provide a link to instructions for obtaining access (e.g. here is the information page for ADNI (Alzheimer's Disease Neuroimaging Initiative): http://adni.loni.usc.edu/data-samples/access-data/).

When re-using a private dataset from a previous study please contact the data owners to discuss how the data can be made public.

Most suitable repositories: Not applicable

timvines19 commented 3 years ago

sorry for the hassle - the separate data -reuse pages for each data type will need to be removed

timvines19 commented 3 years ago

@NicolasKieffer the wiki is now up to date

timvines19 commented 3 years ago

this has been implemented