Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
182 stars 37 forks source link

Questions about searching the Pierce iRT peptides. #1673

Open vindr20 opened 1 month ago

vindr20 commented 1 month ago

- Upload your log file (If a log file hasn't been generated, go to the 'Run' tab in FragPipe, click 'Export Log', zip the resulting "log_[date_time].txt" file to avoid truncation, then attach the zipped file by drag & drop here.) log_2024-07-14_20-07-56.txt log_2024-07-14_20-27-43.txt

- Describe the issue or question: I'm having issues working with Pierce iRT standards in my samples. In general, fragpipe seems to have a lot of trouble ID'ing them (0 or 1 peptide IDs), even when I inject pure standards and add c-terminal heavy lysines/arginine as fixed modifications. I've tested with both DIA and DDA methods, and manual examination in skyline shows quite convincing spectra that are acquired by both methods. My fasta file is currently just the Pierce standards plus decoys and contaminants, but I have also experienced this issue with a full h.sapiens fasta with the pierce standards appended.

Could you please advise me as to what, if anything, I may be doing incorrectly?

fcyu commented 1 month ago

It seems that something was wrong with your LC-MS files or fasta file. Some hits: DDA

[progress: 262/262 (100%) - 2278 spectra/s] 0.1s | remapping alternative proteins and postprocessing 0.2 s

DIA

[progress: 3169/3169 (100%) - 9632 spectra/s] 0.3s

There are too few scans in both DDA and DIA.

If you like, could you upload your fasta files and raw files to https://www.dropbox.com/request/0OzwbMC4xGe8PQCUBqJB ? I will take a closer look.

Best,

Fengchao

vindr20 commented 1 month ago

I've uploaded my raw files and the pierce retention time standards fasta. In fragpipe, I add decoys and contaminants before searching.

The number of scans seems about right to me though; we have an older/slower instrument (QE+) and these are short runs, so DIA doesn't generate that many scans, and I don't think DDA would be expected to trigger many acquisitions when the sample is pure standards. Let me know if I misunderstood your point though.

In case it is useful: I also tried spiking in the standard peptides into a standard digest and analyzing over a longer gradient with DIA, with similar issues; I can also share those files if you'd like.

Thank you for help! I really do appreciate it.

fcyu commented 1 month ago

Thanks for uploading your files. The * in your fasta file broke the program

>PIERCE_88320
SSAAPPPPPR*
GISNEGQNASIK*
HVLTSIGEK*
DIPVPKPK*
IGDYAGIK*
TASEFDSAIAQDK*
SAAGAFGPELSR*
ELGQSGVDTYLQTK*
GLILVGGYGTR*
GILFVGSGVSGGEEGAR*
SFANQPLEVVYSK*
LTILEELR*
NGFILDGFPR*
ELASGLSFPVGFK*
LSSEAPALFQFDLK

After removing the starts, FragPipe detected all 15 iRT peptides: log_2024-07-15_22-42-22.txt peptide.zip

Best,

Fengchao

vindr20 commented 1 month ago

Thank you! I was using a fasta file from another software pipeline, and didn't look too closely at it to see that it was atypical.

Removing the asterisks enabled fragpipe to find these peptides in the DDA data, as expected.

If it's acceptable to ask a follow-up question: Is there a way to search a sample with these peptides spiked-in without enabling variable modifications for the c-terminal heavy label across the whole proteome? I notice that I get a few IDs for proteins with heavy isotopic labels, which is obviously incorrect, and the search generally finds fewer proteins/peptides. But if I don't specify the heavy label as a fixed or variable modification, I can't find the standard peptides at all.

It seems to me that it would be better to search a database with only light peptides for the proteome, but still contains the heavy peptide standards, but I can't find an option for that.

fcyu commented 1 month ago

You can do that with a small trick

  1. Change the heavy K to B in your fasta file
  2. Change the heavy R to J in your fasta file
  3. Set the fixed modification of B and J to the mass of heavy K and R, respectively
  4. In the digest rules, change it from KR to KRBJ. Or, put the iRT peptides to separated proteins.

Best,

Fengchao

vindr20 commented 1 month ago

I have attempted this, but it seems that specifying custom amino acids breaks DIANN. Log file attached: log_2024-07-16_17-31-20.txt

I attempted defining heavy lysine/arginine as modifications to B and J in the DIANN command line options, but it didn't seem to help.

fcyu commented 1 month ago

It is not DIA-NN, it is MSBooster @yangkl96 .

Best,

Fengchao

fcyu commented 1 month ago

@yangkl96 Any updates about this MSBooster error?

Thanks,

Fengchao

yangkl96 commented 1 month ago

Sorry I just saw this. MSBooster is not currently equipped to handle custom amino acids. I can implement this right now and get back to you ASAP

yangkl96 commented 1 month ago

Hi @vindr20 ,

Attached below is a new MSBooster version that should support B and J. Please let us know if this works for you

https://www.dropbox.com/scl/fi/9v0men3eae218icysokfd/MSBooster-1.2.39.jar?rlkey=axfwxfbkxpec0fjl51htunaql&dl=0

Best, Kevin

vindr20 commented 1 month ago

Thank you for your help! I don't seem to have permissions/access to that dropbox link though. Could you adjust it so I can access the files?

yangkl96 commented 1 month ago

Yes you should have permissions now: https://www.dropbox.com/scl/fi/9v0men3eae218icysokfd/MSBooster-1.2.39.jar?rlkey=axfwxfbkxpec0fjl51htunaql&dl=0

vindr20 commented 1 month ago

Okay, I had a chance to try this. Unfortunately, the pipeline still breaks, albeit further down this time. Log file attached. If I had to guess from looking at it, easypqp doesn't know how to handle the new amino acids either. log_2024-07-24_11-39-08.txt

I did check that disabling fixed modifications to B/J, and setting trypsin to only cleave at 'KR' allowed the pipeline to process as per usual.

fcyu commented 1 month ago

Thank you so much for the testing.

The error is because EasyPQP doesn't support the noncanonical amino acids. I have fixed it (https://github.com/grosenberger/easypqp/commit/17d49cd184baaf9d0581366636bfd4231d4057a5) and released a new version. Could you upgrade EasyPQP in the FragPipe "config" tab and try again?

Thanks,

Fengchao

vindr20 commented 1 month ago

I updated easypqp to 0.1.48 and tried again, but it still failed. Log file attached. log_2024-07-24_15-21-37.txt

fcyu commented 1 month ago

I apologize for the oversight. I should have tested it before pushing the commits.

It is actually more complicated than I thought. I pushed a new commit, https://github.com/Nesvilab/easypqp/commit/83247ba8cc2bf79355325c5f0923681cb8da2b71, trying to fix it, BUT OpenMS, which is a C++ library used by EasyPQP, threw another error

RuntimeError: the value 'B' was used but is not valid; Modification '': origin must be a letter from A to Y, excluding B and J.

Changing the C++ library is complicated because needing to coordinate the whole OpenMS team. I have submitted a ticket to https://github.com/OpenMS/OpenMS/issues/7554. Let's hope that they will implement this feature soon.

For now, you could use U and O for labeled K and R, respectively. Note that U has the non-zero mass 150.95363 and O has the non-zero mass 237.14773. You need to set the fixed modifications equal to the mass difference of labeled K/R and U/O.

Let me know if you have any questions or get any errors when running FragPipe.

Best,

Fengchao

vindr20 commented 1 month ago

I have attempted using O and U, and successfully identified several Pierce standards spiked into a sample, but to be honest, this doesn't seem like it is performing well compared to allowing heavy c-terminal residues as a variable modification. To elaborate:

  1. Using O/U to describe heavy lysine/arginine resulted in fewer identified standard peptides (6) than allowing variable heavy K/R c-termini globally (14 peptides identified). I'm not sure why this is, but it is very problematic.
  2. Using the skyline export feature, skyline does not import any of these standard peptides, despite all of them being easily found manually. Presumably this is because Skyline does not support O/U. This wasn't a huge problem because I knew what to look for, but this approach means that I lose the benefit of importing any predicted spectra for the standard peptides.
  3. Using the spike-in standards for retention time alignment fails when using O/U to encode the heavy lysine/arginine. Log file attached for this one. log_2024-07-24_19-46-26.txt

I tend to think it would be more elegent if there was a way to specify protein-specific modifications - that way only the standards would be modified, and all software involved would agree that they were looking at heavy lysine/arginine. I know MaxQuant has that feature, but I suspect it's not trivial to implement.

In any case, thank you for your help! I hope this is an area that can see active development; if the software can take advantage of them, these spike-in standards have a lot of value for some of our clinical test R&D.

fcyu commented 1 month ago

Yes, I agree. It seems that using noncanonical amino acids to replace the labeled ones is not very ideal. We will discuss to see if we can implement the protein-specific modifications easily.

Best,

Fengchao