Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

error reading PSM with unique FASTA #1301

Closed ambgrav closed 10 months ago

ambgrav commented 10 months ago

unique_log_2023-10-19_20-21-48.txt fragpipefasta_log_2023-10-19_12-59-53.txt

Hi,

I have two FASTA files, one downloaded directly from fragpipe and the other is one that I made by uploading a FASTA with unique protein mutants without decoys or contaminants to fragpipe and then I used fragpipe to add the decoys in the database tab. However, it was unable to complete the add decoys function, but still managed to generate decoys for the unique protein mutants. I ended up just copying and pasting the mutant decoys as well as my unique protein sequences directly into the original FASTA generated from fragpipe in order to build the complete FASTA file with the decoys, contaminants, unique sequences, and rest of the human proteome. The formatting as far as I can tell is the same between the two FASTAs.

However, when running a search using my unique FASTA, the error pasted below occurs after the main search. This error does not occur when using the FASTA generated directly from fragpipe. Attached are both log files. The one titled "unique" is the one with the error and was run using the FASTA I generated.

I am very new to this, so any help would be appreciated. Thank you!

All files have been read Percolator version 3.06.0, Build Date May 11 2022 12:43:39 Copyright (c) 2006-9 University of Washington. All rights reserved. Written by Lukas Käll (lukall@u.washington.edu) in the Department of Genome Sciences at the University of Washington. Issued command: C:\Users\malyl\Desktop\Plugins\fragpipe\tools\percolator-306\percolator.exe --only-psms --no-terminate --post-processing-tdc --num-threads 31 --results-psms A20230726_AG_DIA_2mgmL_300uL_20uL_c_percolator_target_psms.tsv --decoy-results-psms A20230726_AG_DIA_2mgmL_300uL_20uL_c_percolator_decoypsms.tsv --protein-decoy-pattern rev A20230726_AG_DIA_2mgmL_300uL_20uL_c.pin Started Thu Oct 19 20:21:47 2023 Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10 Reading tab-delimited input from datafile A20230726_AG_DIA_2mgmL_300uL_20uL_c.pin Features: rank log10_intensity abs_ppm abs_rt_diff complementary_ions ion_series weighted_average_abs_fragment_ppm weighted_average_fragment_rt_deviation precursor_rt_deviation isotopes log10_kl log10_kl_negative_1 log10_kl_negative_2 hyperscore matched_ion_num peptide_length ntt nmc charge_1 charge_2 charge_3 charge_4 charge_5 charge_6 charge_7_or_more group_1 group_2 group_3 group_other Exception caught: ERROR: Reading tab file, error reading PSM A20230726_AG_DIA_2mgmL_300uL_20uL_c.46689.46689.4_4. Check if a peptide and at least one protein are specified.

Process 'Percolator' finished, exit code: 1 Process returned non-zero exit code, stopping


Cancelling 22 remaining tasks
fcyu commented 10 months ago

Something must be wrong with your fasta file. Please double check it.

Best,

Fengchao

ambgrav commented 10 months ago

Thank you for the quick response! Do you have any tips on how I should check the fasta file? I converted both to txt files and visually compared the two in excel... both have the same header formatting and I can't seem to find any differences other than the additional mutants.

fcyu commented 10 months ago

Can you share A20230726_AG_DIA_2mgmL_300uL_20uL_c.pin and rev_LckNeutCys_UNIPROT_HUMAN_DIA.fas with me?

Also, I see that there are space in the path C:\Users\malyl\Desktop\Plugins\ MAXQUANT FASTAS \rev_LckNeutCys_UNIPROT_HUMAN_DIA.fas. Please remove it.

Best,

Fengchao

fcyu commented 10 months ago

BTW, in your log files, you has ptmshepherd.glyco_mode=true which should be false. You can uncheck the following box.

image

Best,

Fengchao

ambgrav commented 10 months ago

Hi, yes I am just rerunning a search using a fasta with different sequences so I don't share some confidential data. I'll send you the files if the search produces the same error. Thank you for all of your help so far!

ambgrav commented 10 months ago

fake_log_2023-10-20_15-31-24.txt

google drive link for A20230726_AG_DIA_2mgmL_300uL_20uL_c.pin: https://drive.google.com/file/d/1WzL13XYqhhwr-gBMNvnOaXeo2K0wkcnt/view?usp=share_link

google drive link for .fas: https://drive.google.com/file/d/1uOTV2Jd3doiLKqajED9Opt9237rHXl0E/view?usp=share_link

Here are the files. I reran the search but replaced the mutants with different protein sequences. I also unchecked the glycan box and removed spaces from the file names. Seems like the same error was encountered.

fcyu commented 10 months ago

Thanks for your files. Could you remove " in the protein headers and try again?

">contam_sp|O77727|K1C15_SHEEP Keratin, type I cytoskeletal 15 OS=Ovis aries OX=9940 GN=KRT15 PE=2 SV=1"

Best,

Fengchao

ambgrav commented 10 months ago

Hi Fengchao,

That fixed the problem!! Thank you again for all of your help.

-Amber