Closed timosachsenberg closed 4 years ago
@timosachsenberg can you do a proposal based on MSTasts. I don't really understand what means biological sample are grouped
and how this is different from the experimental factor
the variable that will be finally evaluated.
MSstats has the concepts of:
Run, Fraction, TechRepMixture, Channel, Condition, BioReplicate and Mixture
where:
For example, if 'TechRepMixture' = 1, 2 are the two technical replicates of one mixture, then they should have the same 'Mixture' value.
One technical replicate of one mixture might e.g. be recorded with multiple fractions. For example, if 'Fraction' = 1, 2, 3 are three fractions of the first technical replicate of one TMT mixture of biological subjects, then they should have the same 'TechRepMixture' and 'Mixture' value.
BioReplicate : Unique ID for a biological sample.
I think we need to model these in the specification to be able to analyze multiplexed data e.g., with channel swaps etc.
So let's take a look what we have:
So let's take a look what we have:
- Run="Comment [data file]"
- Channel=Comment [Label]
- Fraction=Comment [Fraction Identifier]
- BioReplicate="Source Name" (why optional in spec?)
Should we make then "Mandatory"
- Condition="Factor Value[NAME OF THE CONDITION]" (what if conditions span multiple factors?)
If you have multiple Factors you add multiple columns. This is really simple, nothing stops you to provide multiple Factor Value[CONDITION 1], Factor Value[CONDITION 2] ... Factor Value[CONDITION n]
- Mixture = Currently not modeled. Absolutely needed to group all fractions together!
- TechRepMixture=Currently not modeled. Needed to distinguish Mixtures that contain exactly the same labels and samples
Can you elaborate on what are these Mixtures?
I mean what if ONE condition spans multiple factors
mixture: you can have sample 1,2,3,4,5,6 measured in TMT6plex but also as e.g., 6,5,4,3,2,1 (aka channel swap) If you have multiple fractions you can not distinguish the fractions corresponding to 1,2,3,4,5,6 from the ones corresponding to 6,5,4,3,2,1 except if you have the mixture identifier
Hi, more examples for multiplexed experiments would be really helpful. I also like the idea of adopting MSstats annotations. Ideally in a way that the user can directly recycle the MSstats annotation file to annotate the data for PRIDE upload. Best, Melanie
@foellmelanie:
@timosachsenberg is working in an exporter from sdrf to MSstats (https://github.com/bigbio/sdrf-openms).
currently only lfq but multiplexed will come soonish
Hi, this is fantastic for an easy re-analysis! I see a lot of benefit to make the other way around (MSstats to sdrf) also as easy as possible. After spending a lot of time to create the MSstats annotation file it would be really frustrating if one has to do the same work all over again to generate the sdrf file for the submission of the raw data to a public repository...
@foellmelanie :
Remember two things:
Our idea now is to have an easy automatic translation from SDRF -> MStats file format, and guidelines on how to translate MSstats fields into SDRF fields.
Regards
Hi, this is fantastic for an easy re-analysis! I see a lot of benefit to make the other way around (MSstats to sdrf) also as easy as possible. After spending a lot of time to create the MSstats annotation file it would be really frustrating if one has to do the same work all over again to generate the sdrf file for the submission of the raw data to a public repository...
Sounds great. I just wanted to point out that the easier MSstats can be translated into sdrf the better :)
I think we should give an example of how fractions of the same biological sample are grouped. Ideally in the context of multiplexed data as this is kind of the most complex example. Also MSstats e.g., considers e.g. pooled normalization channels and has the concept of mixtures. Basically we should extend https://github.com/bigbio/proteomics-metadata-standard/tree/master/experimental-design#3-from-sample-to-msrun