bigbio / proteomics-sample-metadata

The Proteomics Experimental Design file format: Standard for experimental design annotation
GNU General Public License v2.0
75 stars 106 forks source link

New column needed for versions [PSI-Suggestion] #491

Closed ypriverol closed 3 years ago

ypriverol commented 3 years ago

We need to control the version of the file format in each SDRF. If the standard develops over time (as expected) would be great to control what is the version for a specific file.

My proposal is to have a column comment[sdrf-p version] that controls the version of the file format.

Comments @levitsky @mvaudel @mlocardpaulet @all

levitsky commented 3 years ago

On one hand this sure sounds redundant, having the same string repeated for all rows and basically representing something that doesn't belong in any row, as this is metadata but it is mixed with data here (or should I say, meta-metadata mixed with metadata?).

On the other hand, I do not see any good alternatives for adding this information within the same file. Perhaps a clean solution would be to optionally provide a separate small file with metadata. We could make it any format, probably a simple key: value would suffice, and fix its name (something like meta.txt or whatever), and for now it would only store the SDRF version (although we can discuss, maybe we could use it for other things, too?)

mlocardpaulet commented 3 years ago

My personal opinion is that if we want everybody to generate the SDRF themselves for submission (MS users with no informatics background included), we need to keep it as easy as possible for them: so a new column (even if this is redundant) sounds good to me.

ypriverol commented 3 years ago

The other format would be the IDF, which is part of the MAGETAB and contains general metadata (key:value) and the corresponding SDRF link and also additional information such as protocols, versions, authors etc. https://github.com/ebi-gene-expression-group/sc-metadata-fields/blob/master/IDF_template.txt

daichengxin commented 3 years ago

Personal opinion: If just add version information, a new column is enough even if it may be redundant. But characteristic and comment should be avoided. sdrf[version]?

Because this attribute belongs neither to the sample nor to the proteomics data related information. If want to add more information, a separate small file with metadata is optional

ypriverol commented 3 years ago

@anjaf do you know which prefix we can use for the SDRF version?

anjaf commented 3 years ago

I like @ypriverol's idea of making it a comment in the IDF, e.g. Comment[SDRF version]. This is how we usually include custom study-level annotations.

ypriverol commented 3 years ago

@levitsky @mlocardpaulet @anjaf :

I was discussing with @anjaf today, and we find out that SDRF accepts comments around the file using #:

Blank lines containing zero or more spaces or tabs are permitted in any of these files. Lines starting with the “#” symbol are interpreted as comments.

What do you think about adding a general header to each file containing SDRF version, and other future metadata.

@timosachsenberg @mvaudel ?

levitsky commented 3 years ago

I think it's a practical solution. We will have to update the parser(s) so that it works on files with comments, but it's not that hard.

ypriverol commented 3 years ago

We will move this information to the IDF, please read PR #505