compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
42 stars 15 forks source link

Lots of G's in X!Tandem screen output #299

Closed StevenVerbruggen closed 3 years ago

StevenVerbruggen commented 3 years ago

Hi Marc and Harald,

I am currently reanalyzing melanoma data (the test I describe here, is performed on file 20141212_QEp7_MiBa_SA_HLA-I-p_MM16_1_A.raw of ProteomeXChange dataset PXD004894). The RAW file was first converted to mgf and mzML using ThermoRawFileParser (http://compomics.github.io/projects/ThermoRawFileParser).

I use the mgf file in an X!Tandem search:

nohup java -cp /home/steven/tools/SearchGUI-4.0.18/SearchGUI-4.0.18.jar eu.isas.searchgui.cmd.SearchCLI -spectrum_files /data2/steven/melanoma_proteomics_bassinisternberg/mel16/mgf/20141212_QEp7_MiBa_SA_HLA-I-p_MM16_1_A.mgf -fasta_file /data2/steven/melanoma_proteomics_bassinisternberg/mel16/fasta/comb_fasta_rnaseq_uniprot_hspvdb_filtered_crap_concatenated_target_decoy.fasta -output_folder /data2/steven/melanoma_proteomics_bassinisternberg/mel16/xtandem_out/ -id_params /data2/steven/melanoma_proteomics_bassinisternberg/mel16/bassinisternberg_mel16.par -xtandem 1 -comet 1 -threads 5 > nohup_searchgui_xtandem_comet.txt &

This generates following screen output in the nohup file:

Thu Apr 22 15:26:15 CEST 2021 Importing spectrum files.
Thu Apr 22 15:26:15 CEST 2021 Importing spectrum files completed (108.0 milliseconds).

Processing: 20141212_QEp7_MiBa_SA_HLA-I-p_MM16_1_A.mgf (1/1) 
Thu Apr 22 15:26:15 CEST 2021 Converting spectrum file 20141212_QEp7_MiBa_SA_HLA-I-p_MM16_1_A.mgf to peak list. 
10% 20% 30% 40% 50% 60% 70% 80% 90%

xtandem command: 
/home/steven/tools/SearchGUI-4.0.18/resources/XTandem/linux/linux_64bit/tandem /home/steven/tools/SearchGUI-4.0.18/resources/temp/search_engines/xtandem/input_searchGUI.xml 

Thu Apr 22 15:26:30 CEST 2021 Processing 20141212_QEp7_MiBa_SA_HLA-I-p_MM16_1_A.mgf with X!Tandem.

X! TANDEM Vengeance (2015.12.15.2)

Loading spectra (mgf).............................. loaded.
Spectra matching criteria = 58797
Starting threads ..... started. 
Computing models:
G
G
G
G
G
G
G
G
G
G
G
G

The list of new lines with a 'G' keeps running for quite a while and grows while the search is running. The normal progress messages are not to be found in the screen output. Any idea what can cause this stream of G's? I used X!Tandem in SearchGUI 4.0.18 before on other data and that gave the normal screen output. I search with X!Tandem on the mzML file as well, gave the same phenomenon. Searching the mzML file with MSGF+ in SearchGUI seems to react normal (although the run is currently still running, but I get the expected progress messages over there: work splitted in 15 tasks, 1.78% currently complete). I also included the used parameter file here (extension was changed from 'par' to 'txt' for GitHub upload guidelines). bassinisternberg_mel16.txt

I performed the X!Tandem search last week on an mgf conversion of the same RAW file, but that time generated with MSConvert. Gave also the stream of G's and I let that search running for 5 days without any progress messages in the stream of G's (did a quick search through the nohup, only G's) nor any end result. This project is done for a peptidomics project and we saw better results with X!Tandem than with MSGF+ earlier, so would be nice to r

Thanks for any insight into this problem in advance. If you would need any more info, please let me know.

Best regards, Steven

hbarsnes commented 3 years ago

Hi Steven,

The long list of G's in the X!Tandem output usually indicates an issue with the search settings. The problem here seems to be the use of the "Acetylation of peptide N-term". I will dig a bit deeper and get back to you. But for now you should be able to simply remove the "Acetylation of peptide N-term" modification when running X!Tandem, as X!Tandem by default searches for these in its second pass search anyway.

We do not have a lot of experience with non-enzymatic searches though, so there may be other X!Tandem parameters that you want to fine tune as well.

Best regards, Harald

hbarsnes commented 3 years ago

Hi again,

I have to take back that point about X!Tandem adding the "Acetylation of peptide N-term" in its second pass search. That is only for "protein N-term" (https://www.thegpm.org/TANDEM/api/pqa.html).

But in any case, it would be great if you could try without "Acetylation of peptide N-term" just to see if the search then completes? At least there are no G's printed on my end after this change.

Best regards, Harald

StevenVerbruggen commented 3 years ago

Hi Harald,

Thanks for your swift input. It was actually the purpose to include "Acetylation of protein N-term" as a variable mod instead of "Acetylation of peptide N-term". Appears that I copied the wrong line from the mods list. I started an X!Tandem run now with protein N-term acetylation in the parameter file instead of peptide N-term acetylation. Half hour in the run and no stream of G's yet. I will keep you updated when the run is completed, but looks good. If it won't work this way after all, I will try another run with neither protein nor peptide N-term acetylation and check what that brings.

Thanks again. Have a nice weekend,

Steven

hbarsnes commented 3 years ago

Hi Steven,

Thanks for the update. In any case, I'm now in the process of releasing a new version that should also allow you to search with the acetylation on the peptide n-term. There was a bug in which we ended up converting the acetylation on the peptide n-term into acetylation on any amino acid, thus explaining why X!Tandem was not very happy. :)

I will close this issue, but don't hesitate to open a new one if you come across other problems.

Best regards, Harald