bigbio / proteomics-sample-metadata

The Proteomics sample metadata: Standard for experimental design annotation in proteomics datasets
GNU General Public License v2.0
76 stars 107 forks source link

Material type and Description columns #272

Closed levitsky closed 4 years ago

levitsky commented 4 years ago

I see Material Type in many annotations but it is not specified anywhere in the standard description. Is it allowed, and if yes, why is it needed?

Material type sounds like a sample characteristic. If there is not a suitable term in EFO, perhaps one can be added? Alternatively, the use of Material Type needs to be described in the specification.

Upd: Another column that I see is "description". I think it is an attempt to describe the data set as a whole but it is repeated for every row. I don't think it belongs in SDRF.

ypriverol commented 4 years ago

Agree, I think this is a legacy property from Transcriptomics experiments. @anjaf can you let us know why is important?

levitsky commented 4 years ago

Right now we have organism part or cell type which kind of alternate, which is indeed not ideal. So I see why Material Type would help, e.g. if it is cell then you can look for cell type but if it is tissue then you can look for organism part, etc.

However, organism part and cell type are mandatory for all annotations anyway, so either one can be checked without any preconditions.

levitsky commented 4 years ago

@ypriverol I expanded the issue to include description as well, another column used but not described.

ypriverol commented 4 years ago

I will take a look. @levitsky do you have more annotated projects coming soon? Would be nice to check these list https://github.com/bigbio/proteomics-metadata-standard/issues/271

levitsky commented 4 years ago

We're working through our list of annotated projects. What I can say is that we focused on live human samples, not cell lines, so probably no intersections with that list.

anjaf commented 4 years ago

Yes, "Material Type" and "Description" come from the original MAGE-TAB specifications, and relate to the Source Name column. So Material Type basically means "source material type": what material was used as input at the start of the experiment? The controlled vocabulary is "whole organism", "organism part", "cell", "DNA", "RNA". DNA and RNA are obviously not applicable for proteomics experiments, so yes that would leave you the other three. We also tend to avoid the "Description" field (as it is not very specific), in favour of the specific Characteristics fields. (We have "description" field in Annotare for inexperienced submitters to use if they can't find any other options and curators then put the information from this field under the suitable Characteristics terms.)

ypriverol commented 4 years ago

@levitsky I suggest that in order to be compatible with transcriptomics we allow the following column names:

As additional properties, we don't validate anything from them, it is up to the user to provide them.

levitsky commented 4 years ago

I don't have the context knowledge to opine on the necessity of keeping compatibility. Abstractly speaking, allowing "description" will probably tempt annotation authors to fill it with information that belongs in another column, or multiple columns, reducing the utility of annotation. To avoid this, any use of description should be actively discouraged, if allowed.

ypriverol commented 4 years ago

Let's remove it Description.

levitsky commented 4 years ago

If it is decided, we can close this and merge https://github.com/bigbio/sdrf-pipelines/pull/36.