compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
39 stars 14 forks source link

Suggestions for Ion mobility MS data #100

Open frankligy opened 8 months ago

frankligy commented 8 months ago

Hello @RalfG,

First thanks very much for developing such wonderful tool, I really enjoyed working with it!

I have a question regarding how to apply MS2rescore to bruker (.d) or broadly speaking, the ion mobility MS data (TIMS), such that features from same RT will further be separated by a gas phase.

Practically, the issue I am facing right now is when using MS2PIP to generate features, I can not correspond the MaxQuant scan ID and the raw scan ID from bruker raw data (.d). It seems that MaxQuant did some sort of accumulation along ion mobility axis to make the MS/MS as conventional spectrum and then submitted to search engine (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261821/). Because of that, my impression is the current implementation of MS2rescore can not handle well because it relies on one-to-one correspondence between a PSM and the raw MS/MS spectrum.

There are other conceptual challenges like the lack of training model for TIMS-TOF specifically, and technically, the ion mobility should also be a predictable feature that may help with rescoring. In light of that, I just want to get your thoughts on:

[1] Whether my understanding is correct, that current implementation of MS2rescore is more focused on Thermo data and may not be applicable to other like bruker TIMS data.

[2] Do you have any ideas on the difficulties that you can foresee of adapting the model to ion mobility data?

[3] If I still want to use that, can I skip the MS2PIP step and only use DeepLC and other features, even additional features from my customized function to assist with rescoring, do you think there will still be any increase on the identification rate? Because it seems that MS2PIP features indeed contribute a lot to the prediction.

Thanks very much in advance, Frank

RalfG commented 8 months ago

Hi Frank,

Thank you for your interest in MS²Rescore!

Regarding your issue with MaxQuant, you are indeed right that we must receive a one-to-one relation between PSMs and spectra. In the case of aggregated spectra from a TIMSTOF, we require access to the aggregated spectra instead of the original ones.

So far, we have only tried MS²Rescore on TIMSTOF data analyzed with the PEAKS search engine, where PEAKS outputs both PSM files and MGF files with the aggregated spectra. Do you know if MaxQuant can similarly output the aggregated spectra in MGF or mzML formats?

In terms of prediction models, everything should be ready. In the upcoming v4.0 of MS²PIP (included in MS²Rescore v3), we have new specialized models for the TIMSTOF instruments. Both tryptic and non-tryptic, including HLA peptides are supported. You can configure MS²Rescore to use this model with the ms2pip configuration section:

"ms2rescore": {
        "feature_generators": {
            "ms2pip": {
                "model": "timsTOF"

Very recently, we have also added the ionmob ion mobility predictor as feature generator. Installing MS²Rescore with the optional dependency (pip install --pre ms2rescore[ionmob] should install everything you need. Then simply add "ionmob": {} to the feature_generators section of the configuration files:

"ms2rescore": {
        "feature_generators": {
            "ionmob": {}

Let us know if we could look into the spectrum matching from MaxQuant issue together. We would definitly like to help you out.

Best, Ralf

frankligy commented 8 months ago

Hi Ralf,

Thanks very much for getting back to me!

I really appreciate the efforts for timsTOF prediction, just to clarify, is the model "timsTOF" for tryptic or non-trypic mode?

ionmob looks really cool, one question, I assume to enable automatic feature generation, we need to have CCS value in the msms.txt file right? Right now it seems that the CCS value is not in maxquant msms.txt but in evidence.txt file, so I guess I need to first transfer the CCS value to the msms.txt file when using the ms2rescore right?

For the Maxquant accumulated spectrum, I opened an issue in their google group (https://groups.google.com/g/maxquant-list/c/mztk0wyUg-w) but hasn't heard back from them. I tried to figure out myself but no luck, it seems that the bruker raw data are laid out as below (I used proteoWizard to convert to mzML):

spectrum1 frame1 scan1
spectrum2 frame1 scan2
...
spectrum_n frame1 scan_n
spectrum_n+1 frame2 scan1
...
...

MaxQuant has accumulatedMsmsscan and pasefMsmsScans in their txt output, but it is not intuitive for me to exactly reproduce how they conduct the accumulation. I understand it is definitely not part of your job as the ms2rescore developer, but if you happen to have any ideas or chances to analyze a bruker .d public file using maxquant, your insights would be really appreciated, and I am sure will benefit more users who try the ms2rescore.

Thanks again, Frank