fmicompbio / einprot

Proteomics analysis workflows
https://fmicompbio.github.io/einprot/
Other
7 stars 0 forks source link

multi assay summarizedExperiment from DIA-NN main report #17

Closed tobiasko closed 4 months ago

tobiasko commented 7 months ago

Dear einprot developers,

a typical DIA-NN main report contains multiple abundance estimators for different feature levels. The packge example data shows:

PG.Quantity | PG.Normalised | PG.MaxLFQ | Genes.Quantity | Genes.Normalised | Genes.MaxLFQ | Genes.MaxLFQ.Unique

So the protein group (PG.) and gene level (Genes.) is estimated by Quantity, Normalised (quantity), MaxLFQ and maxLFQ using only unique chromatographic features. So on the PG level one would have already 3 assays available for SE container import. All would reference the same set of features, so this should totally fit to the container definition

https://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html

Have you thought about something like this instead of reducing to a single assay when building the container?

csoneson commented 7 months ago

Hi @tobiasko - indeed, this should be what is currently done:

> sce <- importDIANN(inFile = system.file("extdata/diann_example/PXD028735.report.tsv", package = "einprot"), 
                     fileType = "main_report", outLevel = "pg", aName = "PG.Quantity")$sce
> SummarizedExperiment::assayNames(sce)
[1] "PG.Quantity"       "PG.Normalised"     "PG.MaxLFQ"         "PG.Q.Value"        "Global.PG.Q.Value"
[6] "Lib.PG.Q.Value"    "Protein.Ids"      
tobiasko commented 7 months ago

Ahhhh nice! So aName is just a selector for the assay that is used for the statistical test, but all assay for a level are imported?

csoneson commented 7 months ago

Yes, exactly - einprot just needs to know where to start from when doing log-transformation, imputation, normalization and downstream analysis. But the other assays are exported as well.

tobiasko commented 7 months ago

What is the Protein.Ids assay?

csoneson commented 7 months ago

That one contains the protein IDs assigned to a protein group - einprot detected that the values in this column are sometimes different for different samples (for the same protein group), and thus this information could not be added as a single annotation column. In this case, einprot will report it as an assay instead (in this case, a matrix with character entries). For example:

> assay(sce, "Protein.Ids")[c(1, 6, 23), ]
              LFQ_Orbitrap_AIF_Condition_A_Sample_Beta_01 LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01
G5EA36;I3L328 "0"                                         "0"                                         
O94826        "O94826"                                    "O94826"                                    
P31948        "P31948;H0YGI8;F5GXD8;F5H783"               "P31948;F5GXD8;F5H783"                      
              LFQ_Orbitrap_AIF_Condition_A_Sample_Gamma_01 LFQ_Orbitrap_AIF_Condition_B_Sample_Beta_01
G5EA36;I3L328 "G5EA36;I3L328"                              "0"                                        
O94826        "O94826"                                     "O94826"                                   
P31948        "P31948;H0YGI8;F5GXD8;F5H783"                "P31948;H0YGI8;F5GXD8;F5H783"              
              LFQ_Orbitrap_AIF_Condition_B_Sample_Gamma_01
G5EA36;I3L328 "G5EA36;I3L328"                             
O94826        "O94826"                                    
P31948        "P31948;H0YGI8;F5GXD8;F5H783"               
              LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01
G5EA36;I3L328 "G5EA36;I3L328"                             
O94826        "O94826"                                    
P31948        "P31948;H0YGI8;F5GXD8;F5H783"    
tobiasko commented 7 months ago

🧐 The protein IDs assigned to a protein group are NOT the same across all analysed samples? I really wonder why DIA-NN does this and what it could mean!?

csoneson commented 7 months ago

Good question :) that I have not yet found the answer to. I was thinking that perhaps in the second sample there was no evidence for H0YGI8 being present, and thus it did not contribute to the third protein group above in that sample.

csoneson commented 4 months ago

I think the original question here was addressed, so I'll close the issue - feel free to reopen or open a new issue if there are additional questions!