Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
111 stars 19 forks source link

Low protein ID compared to Sequest #98

Closed glnegri closed 4 years ago

glnegri commented 4 years ago

I seem to get consistently lower protein ID rates (20% less, considering proteins with at least 1 unique peptide) and peptide IDs (40% less, 1% FDR) using the philosopher TMT pipeline described in the wiki compared to a very similar Proteome Discoverer pipeline (same search parameters, Sequest HT as search engine, Percolator for FDR). Is this an expected result? Any search parameters in closed_fragger that can have such a large effect?

fcyu commented 4 years ago

Could you please send the log to us?

Thanks,

Fengchao

prvst commented 4 years ago

Hi @5utr Can you send me your commands or yml file, please ?

glnegri commented 4 years ago

The experiment is a TMT 11 plex, split into 12 fractions. These are the commands:

`

initialize

philosopher workspace --init

reference proteome

philosopher database --custom uniprot_homo_2018.fasta --contam

config MSFragger

java -jar MSFragger-2.2/MSFragger-2.2.jar --config

run fragger

java -Xmx232g -jar MSFragger-2.2.jar closed_fragger.params *.mzML

peptideprophet

philosopher peptideprophet --database .fasta --ppm --accmass --expectscore --decoyprobs --combine --nonparam .pepXML

proteinprophet

philosopher proteinprophet *.pep.xml

filter

philosopher filter --razor --pepxml .pep.xml --protxml .prot.xml

TMT quant

philosopher labelquant --plex 11 --dir .

report

philosopher report `

prvst commented 4 years ago

Could you share your msfragger parameter file as well ?

anesvi commented 4 years ago

We need to see your fragger.param


From: 5utr notifications@github.com Sent: Wednesday, January 22, 2020 4:18 PM To: Nesvilab/philosopher Cc: Subscribed Subject: Re: [Nesvilab/philosopher] Low protein ID compared to Sequest (#98)

External Email - Use Caution

The experiment is a TMT 11 plex, split into 12 fractions. These are the commands:

`

initialize

philosopher workspace --init

reference proteome

philosopher database --custom uniprot_homo_2018.fasta --contam

config MSFragger

java -jar /home/gnegri/tools/MSFragger-2.2/MSFragger-2.2.jar --config

run fragger

java -Xmx232g -jar MSFragger-2.2.jar closed_fragger.params *.mzML

peptideprophet

philosopher peptideprophet --database .fasta --ppm --accmass --expectscore --decoyprobs --combine --nonparam .pepXML

proteinprophet

philosopher proteinprophet *.pep.xml

filter

philosopher filter --razor --pepxml .pep.xml --protxml .prot.xml

TMT quant

philosopher labelquant --plex 11 --dir .

report

philosopher report `

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/98?email_source=notifications&email_token=AIIMM64AOKNB3OALKBHVRNLQ7CZ3RA5CNFSM4KKML622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJVEOJI#issuecomment-577390373, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM62FGQBQQTZZUNUZT33Q7CZ3RANCNFSM4KKML62Q.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

glnegri commented 4 years ago

This is the parameter file used:

closed_fragger.params.txt

prvst commented 4 years ago

@5utr

Could you also describe in a few words what kind of experiment are you trying to analyze ? (i.e. instrument type, sample type, MS2 or MS3 quantification, PTMs you are expecting to see, etc)

anesvi commented 4 years ago

Is it Multinotch-MS3?

Also,

variable_mod_03 = 14.016 K variable_mod_04 = 229.162932 K

what is 14 on K?

We normally search with TMT on K as fixed, but TMT on N-term as variable


From: 5utr notifications@github.com Sent: Wednesday, January 22, 2020 4:25:03 PM To: Nesvilab/philosopher Cc: Nesvizhskii, Alexey; Comment Subject: Re: [Nesvilab/philosopher] Low protein ID compared to Sequest (#98)

External Email - Use Caution

This is the parameter file used:

closed_fragger.params.txthttps://github.com/Nesvilab/philosopher/files/4100029/closed_fragger.params.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/98?email_source=notifications&email_token=AIIMM66QNQGH4NRMEDODM2TQ7C2S7A5CNFSM4KKML622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJVFCFA#issuecomment-577392916, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZL5NMAWCQWPXYFFLDQ7C2S7ANCNFSM4KKML62Q.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

glnegri commented 4 years ago

Samples are a mix of human tissues and cell lines, 12 fractions, Orbitrap Fusion, MS2 quantification, PTMS are : TMT labels (229), Methylation K (14), Carboxyamidomethylation C (57) Oxidation M (15), Acetylation (42).

Usually I keep TMT as fixed but some sample were treated with formalin that is known to induce lysine methylation. That should affect only ~5% of the peptides.

--

anesvi commented 4 years ago

If MS2 quant, you need to change fragment tolerance from 0.6 to 20, and fragment mass type from 0 to 1 ( I.e. 20ppm). This is the main change.

Are you expecting lysine methylation and TMT to be mutually exclusive ? If not, I.e. they can both be on the same lysine, you need to change ‘allow variable mods on the same residue’ from 0 to 1

Sent from my iPhone

On Jan 22, 2020, at 4:44 PM, 5utr notifications@github.com wrote:

 External Email - Use Caution

Samples are a mix of human tissues and cell lines, 12 fractions, Orbitrap Fusion, MS2 quantification, PTMS are : TMT labels (229), Methylation K (14), Carboxyamidomethylation C (57) Oxidation M (15), Acetylation (42).

Usually I keep TMT as fixed but some sample were tread with formalin that is known to induce lysine methylation. That should affect only ~5% of the peptides.

--

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/98?email_source=notifications&email_token=AIIMM63YEIGGUYLE2LNHTNDQ7C42JA5CNFSM4KKML622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJVG24A#issuecomment-577400176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZIGOFERSDRGNC7JL3Q7C42JANCNFSM4KKML62Q.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi commented 4 years ago

also:

Acetylation and TMT are mutually exclusive

It makes no sense to add TMT as fixed on n-term and still specify n-term acetylation

Sent from my iPhone

On Jan 22, 2020, at 4:44 PM, 5utr notifications@github.com wrote:

 External Email - Use Caution

Samples are a mix of human tissues and cell lines, 12 fractions, Orbitrap Fusion, MS2 quantification, PTMS are : TMT labels (229), Methylation K (14), Carboxyamidomethylation C (57) Oxidation M (15), Acetylation (42).

Usually I keep TMT as fixed but some sample were tread with formalin that is known to induce lysine methylation. That should affect only ~5% of the peptides.

--

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/98?email_source=notifications&email_token=AIIMM63YEIGGUYLE2LNHTNDQ7C42JA5CNFSM4KKML622YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJVG24A#issuecomment-577400176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM6ZIGOFERSDRGNC7JL3Q7C42JANCNFSM4KKML62Q.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues