CambridgeCentreForProteomics / course_expression_proteomics

https://cambridgecentreforproteomics.github.io/course_expression_proteomics/
Other
1 stars 5 forks source link

Notes for MaxQuant users. #29

Closed csdaw closed 3 months ago

csdaw commented 4 months ago

I've written the following notes for MaxQuant users which should help them follow the course. I think they could just be added in plain markdown as an appendix.

Notes for MaxQuant users.

This course was written for proteomics data processed by the Proteome Discoverer software, as that is what the Cambridge Centre for Proteomics core facility uses to process DDA data (TMT and LFQ). Nevertheless, the workflow and basic principles discussed are also applicable to the output of any similar proteomics raw data processing software, including MaxQuant.

Here I have outlined the differences to be aware of when following this course using MaxQuant output text files. The code as written will require some minor modifications to work properly with MaxQuant formatted data.

  1. The rough equivalent of the PSMs.txt file output by Proteome Discoverer is the evidence.txt file output by MaxQuant.
  2. Decoy PSMs passing the score threshold are automatically filtered out by Proteome Discoverer, but this is not the case with MaxQuant. Hence when working with MaxQuant outputs it is important to filter out rows with '+' in the Reverse column.
  3. Equivalent column names and the type of data contained are described here. Ellipses are put where there no equivalent column exists. (PD PSMs.txt column = MaxQuant evidence.txt column):
    • Abundance (float) = Reporter.intensity.corrected (integer)
    • Sequence (string) = Sequence (string)
    • Master.Protein.Accessions (string) = Leading.proteins (string)
    • Master.Protein.Descriptions (string) = ...
    • Contaminants (string, True or False) = Potential.contaminant (string, + or blank)
    • ... = Reverse (string, + or blank)
    • Rank (integer) = ...
    • Search.Engine.Rank (integer) = ...
    • PSM.Ambiguity (string) = ...
    • Number.of.Protein.Groups (integer) = ... (you might calculate this by counting the number of ; in the Leading.proteins column and adding 1)
    • Average.Reporter.SN (float) = ... (you might calculate the average reporter ion intensity and threshold based on that instead)
    • Isolation.Interference.in.Percent (float) = PIF (float, to get the data in exactly the same format you have to calculate (1 - PIF)*100)
    • SPS.Mass.Matches.in.Percent (integer) = ...
  4. Equivalent column names and the type of data contained are described here. Ellipses are put where there no equivalent column exists. (PD Proteins.txt column = MaxQuant proteinGroups.txt column):
    • Accession (string) = Majority.protein.IDs (string)
    • Protein.FDR.Confidence.Combined (string; High, Medium, or Low) = Q.value (float, a Proteome Discoverer protein FDR of 'High' is equivalent to a Q.value < 0.01)
  5. When combinining MaxQuant PSMs.txt and proteinGroups.txt tables, you should join on the following columns:
    • Protein.group.IDs (PSMs.txt) = id (proteinGroups.txt)