bigbio / pmultiqc

A library for QC report based on MultiQC framework
GNU General Public License v3.0
14 stars 9 forks source link

The The number of peptides finally identified is inconsistent in mzTab and csv #20

Closed daichengxin closed 3 years ago

daichengxin commented 3 years ago

In UPS1 report, the number of peptides in the Pipeline Result Statistics table is extracted from the mstats csv file, and the number of peptides in the Spectra Tracking table is extracted from the PSM information in the mzTab file. Why is there a difference?

image image

daichengxin commented 3 years ago

21

ypriverol commented 3 years ago

@timosachsenberg @jpfeuffer it may be because in the csv are the quantified peptides (at least intensity for one condition), where in the mzTab we have all PSMs identified, probably for a lot of peptides not even one feature can be found?

jpfeuffer commented 3 years ago

In this case it becomes more, right? Was targeted_only set to false? Then it is probably the matching between runs that finds a quantity for a Peptide which did not have an ID in this run.

ypriverol commented 3 years ago

In this case it becomes more, right?

The mzTab ones should be higher than the msstats output (as we are getting now) right?

Was targeted_only set to false? Then it is probably the matching between runs that finds a quantity for a Peptide which did not have an ID in this run.

I think is the way around mzTab > msstats

jpfeuffer commented 3 years ago

which msstats csv file? the one from proteomicslfq as input for MSstats (out_msstats.csv) or the one after processing with MSstats (msstats_results.csv)?

jpfeuffer commented 3 years ago

If it is the second, then Yasset is right. If it is the first, they should be equal. But also there is also aggregation happening for best feature across fractions etc. I need more information on which peptides/features go "missing" to give an exact answer.

ypriverol commented 3 years ago

which msstats csv file? the one from proteomicslfq as input for MSstats (out_msstats.csv) or the one after processing with MSstats (msstats_results.csv)?

The input to MSStats

jpfeuffer commented 3 years ago

Are you counting decoy peptides?

jpfeuffer commented 3 years ago

Also MSstats only gets the PEP section, not the PSM section

daichengxin commented 3 years ago

I separately calculated the difference between the pep and psm section of mzTab and the peptide identified in out_msstats. It looks a little strange. The out_msstats sheet shows the peptides that appear in out_msstats, but not in the psm or pep section. https://docs.google.com/spreadsheets/d/1vwBD--OUx2DJVgsHA-564SghOdhUlUHO-5_sFbVj4pY/edit?usp=sharing https://docs.google.com/spreadsheets/d/1xzPblmtsB1FuUV4PMt8okhEFG6Q9ltZRALPZAh_4U_M/edit?usp=sharing

jpfeuffer commented 3 years ago

I unfortunately do not have access to the docs.

daichengxin commented 3 years ago

https://docs.google.com/spreadsheets/d/1vwBD--OUx2DJVgsHA-564SghOdhUlUHO-5_sFbVj4pY/edit?usp=sharing https://docs.google.com/spreadsheets/d/1xzPblmtsB1FuUV4PMt8okhEFG6Q9ltZRALPZAh_4U_M/edit?usp=sharing can you try again ?

jpfeuffer commented 3 years ago

How did you calculate the difference? Did you check for a non-zero "study_variable_abundance" for a specific file in the PEP section and compared it with the rows in the msstats output that have the same "Reference" file?