Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 17 forks source link

Protein grouping in report tables #93

Closed JB91451 closed 4 years ago

JB91451 commented 4 years ago

Dear Felipe,

I'm not sure about the protein grouping that is applied in the different reports from philosopher (or maybe this question is more related to ProteinProphet). In one of our current samples I found the following: In the peptides table there are 6 peptides that belong to protein A in the main column and are also mapped to an other protein B (actually A is just a truncated version of B and there should be no way to distinguish them in normal proteomics). However, in the protein table there is only protein A given and the mapped protein column is empty.

Is this the intended behaviour as long as there is no evidence that protein B could also be present? Is it only the fact that protein A is shorter and thus can explain all peptides with a higher sequence coverage than B would and will this change when I use the "noOccam" option? Unfortunately our database contains a lot of such cases as it contains a 6-frame translation with different start codons.

Best regards, Juergen

anesvi commented 4 years ago

In PSM.tsv.ion.tsv/peptide.tsv, all mapped proteins for that sequence are shown in Mapped column (Protein column shows where it is razor)

In Protein.tsv, there is Indistinguishable protein column, showing all other proteins when several proteins cannot be distinguished from the one shown in Protein column

If you have a shorter isoform, but the longer one got an extra peptide, then the shorter one will not be listed in Indistinguishable. You will see it in the peptide-level files, but not in the protein file. But if the shorter and the longer isoform are identified by exactly the same peptides, you should see them both. One will be in Protein column, and the other in Indistinguishable column

Alexey

From: JB91451 notifications@github.com Sent: Tuesday, December 10, 2019 7:35 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution

Dear Felipe,

I'm not sure about the protein grouping that is applied in the different reports from philosopher (or maybe this question is more related to ProteinProphet). In one of our current samples I found the following: In the peptides table there are 6 peptides that belong to protein A in the main column and are also mapped to an other protein B (actually A is just a truncated version of B and there should be no way to distinguish them in normal proteomics). However, in the protein table there is only protein A given and the mapped protein column is empty.

Is this the intended behaviour as long as there is no evidence that protein B could also be present? Is it only the fact that protein A is shorter and thus can explain all peptides with a higher sequence coverage than B would and will this change when I use the "noOccam" option? Unfortunately our database contains a lot of such cases as it contains a 6-frame translation with different start codons.

Best regards, Juergen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AIIMM64DQFH5FBP2G6RB35TQX6EI7A5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H7OFEJQ, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM65I55QJLDUSDAYEOB3QX6EI7ANCNFSM4JY5YVLQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

JB91451 commented 4 years ago

Dear Alexey,

Thank you for the clearification. This was the behaviour I assumed intuitively. However, the proteins.tsv seems to behave different. I will enclose a shortened fasta file with the two protein sequences in question and the peptides/protein.tsv. While in the peptides table there are 6 peptides mapped to protein "Seq_14258" (protein A from the above example) and only the same peptides are mapped to protein "Seq_71829" (protein B), the latter protein is completely absent in the protein table.

Juergen

On Tuesday, 10-12-2019 at 15:05 Alexey Nesvizhskii wrote:

In PSM.tsv.ion.tsv/peptide.tsv, all mapped proteins for that sequence are shown in Mapped column (Protein column shows where it is razor)

In Protein.tsv, there is Indistinguishable protein column, showing all other proteins when several proteins cannot be distinguished from the one shown in Protein column

If you have a shorter isoform, but the longer one got an extra peptide, then the shorter one will not be listed in Indistinguishable. You will see it in the peptide-level files, but not in the protein file. But if the shorter and the longer isoform are identified by exactly the same peptides, you should see them both. One will be in Protein column, and the other in Indistinguishable column

Alexey

From: JB91451
Sent: Tuesday, December 10, 2019 7:35 AM To: Nesvilab/philosopher

Cc: Subscribed
Subject: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution

Dear Felipe,

I'm not sure about the protein grouping that is applied in the different reports from philosopher (or maybe this question is more related to ProteinProphet). In one of our current samples I found the following: In the peptides table there are 6 peptides that belong to protein A in the main column and are also mapped to an other protein B (actually A is just a truncated version of B and there should be no way to distinguish them in normal proteomics). However, in the protein table there is only protein A given and the mapped protein column is empty.

Is this the intended behaviour as long as there is no evidence that protein B could also be present? Is it only the fact that protein A is shorter and thus can explain all peptides with a higher sequence coverage than B would and will this change when I use the "noOccam" option? Unfortunately our database contains a lot of such cases as it contains a 6-frame translation with different start codons.

Best regards, Juergen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

-- Juergen Bartel

University of Greifswald Center for Functional Genomics of Microbes Institute of Microbiology Department of Microbial Proteomics

Felix-Hausdorff-Str.8 17489 Greifswald

Fon.: +49 (0)3834 - 420 - 5932 (/ - 5965) Fax:  +49 (0)3834 - 420 - 5902

Links:

[1] https://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AMLL4NHSCHUR4GQILZ7W2HDQX6O3TA5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPK4AI#issuecomment-564047361 [2] https://github.com/notifications/unsubscribe-auth/AMLL4NHJUPJEFQVLW5TPPCLQX6O3TANCNFSM4JY5YVLQ

sp|Seq_14258|Seq_14258 MAVVGAGAYLYNTARQKAGVTDDDVNIPGV sp|Seq_71829|Seq_71829 MTKVPLELLGIEVDTDDPGESAQNLGLGVIGVTLT MAVVGAGAYLYNTARQKAGVTDDDVNIPGV

anesvi commented 4 years ago

Please open combined.prot.xml file, and fined that group

Is Seq_71829 listed as Indistinguishable in the same subgroup as Seq_14258? Or as a separate (from Seq_14258) subgroup? If latter, what are the eights of those 6 peptides?

I suspect something distinguished the two entries. Could be tryptic status of the peptides

From: JB91451 notifications@github.com Sent: Tuesday, December 10, 2019 10:05 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Nesvizhskii, Alexey nesvi@med.umich.edu; Comment comment@noreply.github.com Subject: Re: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution Dear Alexey,

Thank you for the clearification. This was the behaviour I assumed intuitively. However, the proteins.tsv seems to behave different. I will enclose a shortened fasta file with the two protein sequences in question and the peptides/protein.tsv. While in the peptides table there are 6 peptides mapped to protein "Seq_14258" (protein A from the above example) and only the same peptides are mapped to protein "Seq_71829" (protein B), the latter protein is completely absent in the protein table.

Juergen

On Tuesday, 10-12-2019 at 15:05 Alexey Nesvizhskii wrote:

In PSM.tsv.ion.tsv/peptide.tsv, all mapped proteins for that sequence are shown in Mapped column (Protein column shows where it is razor)

In Protein.tsv, there is Indistinguishable protein column, showing all other proteins when several proteins cannot be distinguished from the one shown in Protein column

If you have a shorter isoform, but the longer one got an extra peptide, then the shorter one will not be listed in Indistinguishable. You will see it in the peptide-level files, but not in the protein file. But if the shorter and the longer isoform are identified by exactly the same peptides, you should see them both. One will be in Protein column, and the other in Indistinguishable column

Alexey

From: JB91451 Sent: Tuesday, December 10, 2019 7:35 AM To: Nesvilab/philosopher

Cc: Subscribed Subject: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution

Dear Felipe,

I'm not sure about the protein grouping that is applied in the different reports from philosopher (or maybe this question is more related to ProteinProphet). In one of our current samples I found the following: In the peptides table there are 6 peptides that belong to protein A in the main column and are also mapped to an other protein B (actually A is just a truncated version of B and there should be no way to distinguish them in normal proteomics). However, in the protein table there is only protein A given and the mapped protein column is empty.

Is this the intended behaviour as long as there is no evidence that protein B could also be present? Is it only the fact that protein A is shorter and thus can explain all peptides with a higher sequence coverage than B would and will this change when I use the "noOccam" option? Unfortunately our database contains a lot of such cases as it contains a 6-frame translation with different start codons.

Best regards, Juergen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

-- Juergen Bartel

University of Greifswald Center for Functional Genomics of Microbes Institute of Microbiology Department of Microbial Proteomics

Felix-Hausdorff-Str.8 17489 Greifswald

Fon.: +49 (0)3834 - 420 - 5932 (/ - 5965) Fax: +49 (0)3834 - 420 - 5902

Links:

[1] https://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AMLL4NHSCHUR4GQILZ7W2HDQX6O3TA5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPK4AI#issuecomment-564047361 [2] https://github.com/notifications/unsubscribe-auth/AMLL4NHJUPJEFQVLW5TPPCLQX6O3TANCNFSM4JY5YVLQ

sp|Seq_14258|Seq_14258 MAVVGAGAYLYNTARQKAGVTDDDVNIPGV sp|Seq_71829|Seq_71829 MTKVPLELLGIEVDTDDPGESAQNLGLGVIGVTLT MAVVGAGAYLYNTARQKAGVTDDDVNIPGV

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AIIMM64NBFJRQ3RNLGMVZMDQX6V2PA5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPRT6Q#issuecomment-564075002, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM64YEZQDJF2WDKG4ZWDQX6V2PANCNFSM4JY5YVLQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi commented 4 years ago

I mean what are the weights of those peptides if the entry entries are split into multiple subgroups, as I suspect

From: JB91451 notifications@github.com Sent: Tuesday, December 10, 2019 10:05 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Nesvizhskii, Alexey nesvi@med.umich.edu; Comment comment@noreply.github.com Subject: Re: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution Dear Alexey,

Thank you for the clearification. This was the behaviour I assumed intuitively. However, the proteins.tsv seems to behave different. I will enclose a shortened fasta file with the two protein sequences in question and the peptides/protein.tsv. While in the peptides table there are 6 peptides mapped to protein "Seq_14258" (protein A from the above example) and only the same peptides are mapped to protein "Seq_71829" (protein B), the latter protein is completely absent in the protein table.

Juergen

On Tuesday, 10-12-2019 at 15:05 Alexey Nesvizhskii wrote:

In PSM.tsv.ion.tsv/peptide.tsv, all mapped proteins for that sequence are shown in Mapped column (Protein column shows where it is razor)

In Protein.tsv, there is Indistinguishable protein column, showing all other proteins when several proteins cannot be distinguished from the one shown in Protein column

If you have a shorter isoform, but the longer one got an extra peptide, then the shorter one will not be listed in Indistinguishable. You will see it in the peptide-level files, but not in the protein file. But if the shorter and the longer isoform are identified by exactly the same peptides, you should see them both. One will be in Protein column, and the other in Indistinguishable column

Alexey

From: JB91451 Sent: Tuesday, December 10, 2019 7:35 AM To: Nesvilab/philosopher

Cc: Subscribed Subject: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution

Dear Felipe,

I'm not sure about the protein grouping that is applied in the different reports from philosopher (or maybe this question is more related to ProteinProphet). In one of our current samples I found the following: In the peptides table there are 6 peptides that belong to protein A in the main column and are also mapped to an other protein B (actually A is just a truncated version of B and there should be no way to distinguish them in normal proteomics). However, in the protein table there is only protein A given and the mapped protein column is empty.

Is this the intended behaviour as long as there is no evidence that protein B could also be present? Is it only the fact that protein A is shorter and thus can explain all peptides with a higher sequence coverage than B would and will this change when I use the "noOccam" option? Unfortunately our database contains a lot of such cases as it contains a 6-frame translation with different start codons.

Best regards, Juergen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].

-- Juergen Bartel

University of Greifswald Center for Functional Genomics of Microbes Institute of Microbiology Department of Microbial Proteomics

Felix-Hausdorff-Str.8 17489 Greifswald

Fon.: +49 (0)3834 - 420 - 5932 (/ - 5965) Fax: +49 (0)3834 - 420 - 5902

Links:

[1] https://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AMLL4NHSCHUR4GQILZ7W2HDQX6O3TA5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPK4AI#issuecomment-564047361 [2] https://github.com/notifications/unsubscribe-auth/AMLL4NHJUPJEFQVLW5TPPCLQX6O3TANCNFSM4JY5YVLQ

sp|Seq_14258|Seq_14258 MAVVGAGAYLYNTARQKAGVTDDDVNIPGV sp|Seq_71829|Seq_71829 MTKVPLELLGIEVDTDDPGESAQNLGLGVIGVTLT MAVVGAGAYLYNTARQKAGVTDDDVNIPGV

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AIIMM64NBFJRQ3RNLGMVZMDQX6V2PA5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPRT6Q#issuecomment-564075002, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM64YEZQDJF2WDKG4ZWDQX6V2PANCNFSM4JY5YVLQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

JB91451 commented 4 years ago

Dear Alexey,

The proteins both belong to the same protein group tag (358, pseudo_name = 2). I hope the copied xml-part below will be readable.

Juergen

anesvi commented 4 years ago

Ok, at this point I may need Felipe to look into this. But can you send us the whole info like this:

From: JB91451 notifications@github.com Sent: Tuesday, December 10, 2019 10:21 AM To: Nesvilab/philosopher philosopher@noreply.github.com Cc: Nesvizhskii, Alexey nesvi@med.umich.edu; Comment comment@noreply.github.com Subject: Re: [Nesvilab/philosopher] Protein grouping in report tables (#93)

External Email - Use Caution

Dear Alexey,

The proteins both belong to the same protein group tag (358, pseudo_name = 2). I hope the copied xml-part below will be readable.

Juergen

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/philosopher/issues/93?email_source=notifications&email_token=AIIMM644AVUE5I4N2NJAIFTQX6XVFA5CNFSM4JY5YVL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPTNTQ#issuecomment-564082382, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM63SA4DLQAU4W7UO66TQX6XVFANCNFSM4JY5YVLQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues