bzhanglab / PepQuery

PepQuery: a targeted peptide search engine
http://pepquery.org
GNU General Public License v3.0
9 stars 0 forks source link

mgf doesn't exist #68

Closed anuC closed 2 months ago

anuC commented 3 months ago

Hi,

I was trying to run the standalone version of PepQuery with following command to identify novel peptides - java -Xmx30G -jar pepquery-2.0.2/pepquery-2.0.2.jar -b CPTAC_Prospective_Ovarian_JHU_Glycoproteome_PDC000251 -db /refseq_database/GRCh38_latest_protein.fasta -hc -s 1 -m 2 -o pepquery_out -i MTVLWLGSSRSCNSWQSPSCWPSWPRSVRSSGRGSPRRSAGWQSPSCWPSWPRS -t protein -fast > logfile.log

Though the program completed without any error, I could notice the following messages in the log file.

  1. The following message about mgf file was printed in the log file, where it is mentioned that '.mgf file not exist'

^[[m^[[32m2024-07-11 12:53:33 [INFO ] main.java.pg.PeptideSearchMT[search:395] - Total target peptides:40, unique peptides:39, shared peptides:1 ^[[m^[[32m2024-07-11 12:53:33 [INFO ] main.java.pg.PeptideSearchMT[search:407] - Valid target peptides: 39 ^[[m^[[32m2024-07-11 12:53:33 [INFO ] main.java.pg.PeptideSearchMT[generatePeptideInputs:1330] - Generate peptide objects ... ^[[m^[[32m2024-07-11 12:53:33 [INFO ] main.java.pg.PeptideSearchMT[generatePeptideInputs:1341] - CPU: 32 ^[[m^[[32m2024-07-11 12:53:34 [INFO ] main.java.pg.PeptideSearchMT[generatePeptideInputs:1359] - Generate peptide objects done. ^[[m^[[32m2024-07-11 12:53:34 [INFO ] main.java.pg.PeptideSearchMT[search:413] - Step 1: target peptide sequence preparation and initial filtering done: time elapsed = 0.08 min ^[[m^[[32m2024-07-11 12:53:34 [INFO ] main.java.pg.PeptideSearchMT[search:418] - Step 2: candidate spectra retrieval and PSM scoring ... ^[[m^[[32m2024-07-11 12:53:34 [INFO ] main.java.msio.MsdataMatch[get_all_indexed_ms_file:87] - Used CPUs: 32 ^[[m^[[32m2024-07-11 12:53:34 [INFO ] main.java.msio.MsdataMatch[get_all_indexed_ms_file:88] - Download 100 index MS/MS files ... ^[[m^[[33m2024-07-11 12:53:36 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/36557.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:36 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/50144.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:36 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/27625.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:36 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/24694.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:36 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/49985.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:36 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/50304.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:37 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/50146.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:37 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/50306.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:37 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/49986.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:37 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/36558.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:37 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/27626.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:38 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/50145.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:38 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/49984.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:38 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/50305.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:38 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/36568.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:38 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/19172.mgf doesn't exist ^[[m^[[33m2024-07-11 12:53:38 [WARN ] main.java.msio.IndexFileDownloadWorker[download:35] - s3://zhanglab-pepquery/msms_library/CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251/32719.mgf doesn't exist ^[[m^[[32m2024-07-11 12:53:39 [INFO ] main.java.pg.SpectraInput[readSpectraFromMSMSlibrary:371] - Used CPUs: 32

  1. I have also noted that, though the MS dataset name is 'CPTAC_Prospective_Ovarian_JHU_Glycoproteome_PDC000251', PepQuery created the output folder named 'CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251' and the following message was printed in the log file

^[[32m2024-07-11 12:53:29 [INFO ] main.java.util.CParameterSet[load_default_parameter_sets:146] - Load 12 parameter sets. ^[[m^[[32m2024-07-11 12:53:29 [INFO ] main.java.util.MSDataSet[load_default_datasets:147] - Load 48 MS/MS datasets. ^[[m^[[32m2024-07-11 12:53:29 [INFO ] main.java.pg.PeptideSearchMT[main:180] - The number of MS/MS datasets selected: 1, CPTAC_Prospective_Ovarian_JHU_Glycoproteome_PDC000251 ^[[m^[[32m2024-07-11 12:53:29 [INFO ] main.java.pg.PeptideSearchMT[search_multiple_datasets:732] - Searching MS/MS dataset: CPTAC_Prospective_Ovarian_JHU_N_linked_Glycosite_containing_peptide_PDC000251. 0 left, 0 finished. ^[[m^[[32m2024-07-11 12:53:29 [INFO ] main.java.pg.CParameter[updateCParameter:846] - Task type: novel protein identification ^[[m^[[32m2024-07-11 12:53:30 [INFO ] uk.ac.ebi.pride.utilities.pridemod.io.unimod.xml.unmarshaller.UnimodUnmarshallerFactory[initializeUnmarshaller:41] - Unmarshaller Initialized ^[[m^[[32m2024-07-11 12:53:30 [INFO ] main.java.OpenModificationSearch.ModificationDB[importPTMsFromUnimod:355] - All modifications in unimod:1375 ^[[m^[[32m2024-07-11 12:53:30 [INFO ] main.java.pg.PeptideSearchMT[search:243] - Start analysis ^[[m############################################# PepQuery parameter: ^[[32m2024-07-11 12:53:30 [INFO ] main.java.pg.DatabaseInput[getEnzymeByIndex:263] - Use enzyme:Trypsin ^[[mPepQuery version: 2.0.2 PepQuery command line: -b CPTAC_Prospective_Ovarian_JHU_Glycoproteome_PDC000251 -m 2 -db refseq_database/GRCh38_latest_protein.fasta -hc -o pepquery2_out -i MTVLWLGSSRSCNSWQSPSCWPSWPRSVRSSGRGSPRRSAGWQSPSCWPSWPRS -t protein -fast Fixed modification: 1,11,12 = Carbamidomethylation of C,TMT 10-plex of K,TMT 10-plex of peptide N-term Variable modification: 2,3 = Oxidation of M,Deamidation of N Max allowed variable modification: 3 Add AA substitution: false Enzyme: 1 = Trypsin Max Missed cleavages: 1 Precursor mass tolerance: 20.0 Range of allowed isotope peak errors: 0 Precursor ion mass tolerance unit: ppm Fragment ion mass tolerance: 0.05 Fragment ion mass tolerance unit: Da Scoring algorithm: 2 = MVH Min score: 12.0 Min peaks: 10 Min peptide length: 7 Max peptide length: 45 Min peptide mass: 500.0 Max peptide mass: 10000.0 Random peptide number: 10000 Fast mode: true CPU: 32 ############################################# ^[[32m2024-07-11 12:53:30 [INFO ] main.java.pg.PeptideSearchMT[search:250] - Spectrum ID type:1, use 1-based number as index for a spectrum.

Am I doing things correctly? It would be great if you could have a look at these messages.

Any help would be highly appreciated.

Thanks in advance Anu

wenbostar commented 3 months ago

It means there is no spectrum matched for a specific peptideform (sequence+charge+modification). This is not an error message. It is just something printed out for debugging purpose.

wenbostar commented 3 months ago

The MVH scoring (-m 2) is very slow. For most cases, I'd like to recommend hyperscore that is the default one (-m 1) and is much faster than MVH.

anuC commented 2 months ago

Thank you for the detailed reply.