bigbio / proteomics-sample-metadata

The Proteomics Experimental Design file format: Standard for experimental design annotation
GNU General Public License v2.0
75 stars 106 forks source link

Provide example for multiplexed and fractionated sample #48

Closed timosachsenberg closed 4 years ago

timosachsenberg commented 4 years ago

I think we should give an example of how fractions of the same biological sample are grouped. Ideally in the context of multiplexed data as this is kind of the most complex example. Also MSstats e.g., considers e.g. pooled normalization channels and has the concept of mixtures. Basically we should extend https://github.com/bigbio/proteomics-metadata-standard/tree/master/experimental-design#3-from-sample-to-msrun

ypriverol commented 4 years ago

@timosachsenberg can you do a proposal based on MSTasts. I don't really understand what means biological sample are grouped and how this is different from the experimental factor the variable that will be finally evaluated.

timosachsenberg commented 4 years ago

MSstats has the concepts of:

Run, Fraction, TechRepMixture, Channel, Condition, BioReplicate and Mixture

where:

For example, if 'TechRepMixture' = 1, 2 are the two technical replicates of one mixture, then they should have the same 'Mixture' value.

One technical replicate of one mixture might e.g. be recorded with multiple fractions. For example, if 'Fraction' = 1, 2, 3 are three fractions of the first technical replicate of one TMT mixture of biological subjects, then they should have the same 'TechRepMixture' and 'Mixture' value.

BioReplicate : Unique ID for a biological sample.

I think we need to model these in the specification to be able to analyze multiplexed data e.g., with channel swaps etc.

timosachsenberg commented 4 years ago

So let's take a look what we have:

ypriverol commented 4 years ago

So let's take a look what we have:

  • Run="Comment [data file]"
  • Channel=Comment [Label]
  • Fraction=Comment [Fraction Identifier]
  • BioReplicate="Source Name" (why optional in spec?)

Should we make then "Mandatory"

  • Condition="Factor Value[NAME OF THE CONDITION]" (what if conditions span multiple factors?)

If you have multiple Factors you add multiple columns. This is really simple, nothing stops you to provide multiple Factor Value[CONDITION 1], Factor Value[CONDITION 2] ... Factor Value[CONDITION n]

  • Mixture = Currently not modeled. Absolutely needed to group all fractions together!
  • TechRepMixture=Currently not modeled. Needed to distinguish Mixtures that contain exactly the same labels and samples

Can you elaborate on what are these Mixtures?

timosachsenberg commented 4 years ago

I mean what if ONE condition spans multiple factors

mixture: you can have sample 1,2,3,4,5,6 measured in TMT6plex but also as e.g., 6,5,4,3,2,1 (aka channel swap) If you have multiple fractions you can not distinguish the fractions corresponding to 1,2,3,4,5,6 from the ones corresponding to 6,5,4,3,2,1 except if you have the mixture identifier

foellmelanie commented 4 years ago

Hi, more examples for multiplexed experiments would be really helpful. I also like the idea of adopting MSstats annotations. Ideally in a way that the user can directly recycle the MSstats annotation file to annotate the data for PRIDE upload. Best, Melanie

ypriverol commented 4 years ago

@foellmelanie:

@timosachsenberg is working in an exporter from sdrf to MSstats (https://github.com/bigbio/sdrf-openms).

timosachsenberg commented 4 years ago

currently only lfq but multiplexed will come soonish

foellmelanie commented 4 years ago

Hi, this is fantastic for an easy re-analysis! I see a lot of benefit to make the other way around (MSstats to sdrf) also as easy as possible. After spending a lot of time to create the MSstats annotation file it would be really frustrating if one has to do the same work all over again to generate the sdrf file for the submission of the raw data to a public repository...

ypriverol commented 4 years ago

@foellmelanie :

Remember two things:

Our idea now is to have an easy automatic translation from SDRF -> MStats file format, and guidelines on how to translate MSstats fields into SDRF fields.

Regards

Hi, this is fantastic for an easy re-analysis! I see a lot of benefit to make the other way around (MSstats to sdrf) also as easy as possible. After spending a lot of time to create the MSstats annotation file it would be really frustrating if one has to do the same work all over again to generate the sdrf file for the submission of the raw data to a public repository...

foellmelanie commented 4 years ago

Sounds great. I just wanted to point out that the easier MSstats can be translated into sdrf the better :)