Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 17 forks source link

SequenceWindow, Ile over-representation on the n-term side #500

Open pisistrato opened 1 month ago

pisistrato commented 1 month ago

Hi,

I was inspecting the files generated in the tmt-report folder. I noticed a suspicious over-representation of Ile in the SequenceWindow on the n-term side, i.e. before the detected peptide sequence. Can you comment on how that is calculated? It might be real, but I was expecting a Leu to be over-represented...

fcyu commented 1 month ago

It just uses the sequence of the assigned protein in the fasta file.

Best,

Fengchao

pisistrato commented 1 month ago

Since it was very strange, I checked the sequences manually, this is what I see <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

ProteinID | SequenceWindow | Start | SequenceWindowFromFasta | Fasta -- | -- | -- | -- | -- A0A8I5KX85 | TAPVQAPPAP | 148 | TAPVQAPPAP | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx A0A8I5KX85 | AIKIQLDNQY | 239 | ALKLQLDNQY | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx A0A8I5KX85 | NQAIKLQLDN | 237 | NQALKLQLDN | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx A0A8I5KX85 | FPSIQSTAKH | 199 | FPSLQSTAKH | xxxxxMAETEERSLDNFFAKRDKKKKKERSNRAASAAGAAGSAGGSSGAAGAAGGGAGAGTRPGDGGTASAGAAGPGAATKAVTKDEDEWKELEQKEVDYSGLRVQAMQISEKEEDDNEKRQDPGDNWEEGGGGGGGMEKSSGPWNKTAPVQAPPAPVIVTETPEPAMTSGVYRPPGARLTTTRKTPQGPPEIYSDTQFPSLQSTAKHVESRKDKEMEKSFEVVRHKNRGRDEVSKNQALKLQLDNQYAVLENQKSSHSQYNxxxxx

The first one is correct, the others are not. FYI, Start refers to the starting positon excluding the xxxxx

fcyu commented 1 month ago

The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.

Best,

Fengchao

pisistrato commented 1 month ago

Indeed, ALKLQLDNQY would be right, but FragPipe reports AIKIQLDNQY. To me it seems that all L are converted to I in the fasta file used to create the SequenceWindow.

Case closed :)

On Thu, Jul 18, 2024, 21:59 Fengchao @.***> wrote:

The second one is also correct because there is a peptide ALKLQLDNQY in the protein. We don't distinguish I and L when mapping peptides to proteins because they have the identical mass.

Best,

Fengchao

— Reply to this email directly, view it on GitHub https://github.com/Nesvilab/philosopher/issues/500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJC6DX6DFZRXZTCKT6S5T5LZNANATAVCNFSM6AAAAABLCRBL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZXGQ3DANJQGI . You are receiving this because you authored the thread.Message ID: @.***>

fcyu commented 1 month ago

Yes, this is a known bug: https://github.com/Nesvilab/philosopher/issues/430