MannLabs / alphapept

A modular, python-based framework for mass spectrometry. Powered by nbdev.
https://mannlabs.github.io/alphapept/
Apache License 2.0
167 stars 29 forks source link

semi-tryptic? (help wanted) #534

Closed yc386 closed 1 year ago

yc386 commented 1 year ago

Hello,

Thank you for making your source code available. I have been running AlphaPept on Google Colab, and it is working well!

However, I am working on ancient proteins and complex food proteomes. Semi-tryptic digestion is often used to evaluate peptide preservation and extraction protocols.

I notice that it is not possible to process FASTA files using semi-tryptic digestion at the moment. I had a look at your constants.ipynb. Your regex expressions for trypsin "(KR)" seems to cut after K or R except with P in position P1, and trypsin_full "(KR)|((?<=W)K(?=P))|((?<=M)R(?=P))" appears to allow cutting at P under two conditions.

Would you be willing to add a semi-tryptic option to your constants?

Thank you and happy new year.

straussmaximilian commented 1 year ago

Hi,

I am unfortunately with the exact rules for semi-tryptic digestion. Could you elaborate on this and maybe even have a regular expression?

I implemented the proteases according to the Expasy rules: https://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html.

We have additionally non-specific, which would cut at every amino acid. protease_dict["non-specific"] = "()". Here you would need to set missed cleavages to the maximum peptide length.

Screen Shot 2023-01-09 at 1 51 44 PM

Note that this will generate a lot of potential sequences, and you might run into memory issues.

yc386 commented 1 year ago

Hello,

Thank you for this, and yes, the non-specific digestion does include semi-tryptic peptides. The semi-tryptic digestion allows for cutting K (lysine) or R (arginine) at one end, but not strictly the other.

I edited 00_settings.ipynb for .yaml configuration and ran the non-specific search (on a Linux server that has c. 81 GB RAM). It worked and seems reasonable comparing with the outputs of MaxQuant (fully and semi-tryptic digestion).

Screenshot 2023-01-09 at 17 05 20

However, would you be happy to add trypsin/p digestion like MaxQuant? I think the regex expression of trypsin/p would be [KR] (cleaving at either K or R without the restriction of P).

Thanks again!

straussmaximilian commented 1 year ago

Hi, this looks promising. Yes, I will include the trypsin/p digestion for the next release!

yc386 commented 1 year ago

Thank you!🥳