Closed ypriverol closed 1 year ago
@ypriverol do you have a link to the 20150820_Haura-Pilot-TMT1-bRPLC11-1_sage.idXML file?
hit is annotated with charge 0. will check why this is the case
<PeptideIdentification score_type="hyperscore" higher_score_better="true" significance_threshold="0.0" RT="64.922569999999993" spectrum_reference="controllerType=0 controllerNumber=1 scan=20819" >
<PeptideHit score="4.141100637487535" sequence=".(TMT6plex)MLLQQLDALELASEGSLK(TMT6plex)PVVLSSDPQLLFKGQFPQLSLK" charge="0" aa_before="K" aa_after="H" start="1212" end="1251" protein_refs="PH_3410" >
<UserParam type="string" name="target_decoy" value="decoy"/>
<UserParam type="string" name="ln(-poisson)" value="2.720636655734263"/>
<UserParam type="string" name="ln(delta_best)" value="0.0"/>
<UserParam type="string" name="ln(delta_next)" value="2.562488632470743"/>
<UserParam type="string" name="ln(matched_intensity_pct)" value="3.0516565"/>
<UserParam type="string" name="longest_b" value="3"/>
<UserParam type="string" name="longest_y" value="5"/>
<UserParam type="string" name="longest_y_pct" value="0.125"/>
<UserParam type="string" name="matched_peaks" value="22"/>
<UserParam type="string" name="scored_candidates" value="259"/>
<UserParam type="string" name="protein_references" value="unique"/>
</PeptideHit>
<UserParam type="string" name="PinSpecId" value="273895"/>
</PeptideIdentification>
Interesting because the charge range do not go from 0.
The others have a charge assigned. Hmm can we get the sage output somehow "results.sage.pin" to check if it is already written as zero in by sage?
The error is quite common:
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR (Blank)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR (Blank)` terminated with an error exit status (8)
Command executed:
OMP_NUM_THREADS=6 PercolatorAdapter \
-in Blank_sage.idXML \
-out Blank_sage_perc.idXML \
-threads 6 \
-subset_max_train 300000 \
-decoy_pattern DECOY_ \
-post_processing_tdc \
-score_type pep \
-debug 0 \
2>&1 | tee Blank_sage_percolator.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:PERCOLATOR":
PercolatorAdapter: $(PercolatorAdapter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g')
percolator: $(percolator -h 2>&1 | grep -E '^Percolator version(.*)' | sed 's/Percolator version //g')
END_VERSIONS
Command exit status:
8
Command output:
Loading input file: Blank_sage.idXML
Merging peptide ids.
Merging protein ids.
Error: Unexpected internal error (the value 'LALLFYILTFVGAIFNGLTLLILGVIGLFTIPLLYR' was used but is not valid; Can't calculate mass-to-charge ratio for charge=0.)
Command wrapper:
Loading input file: Blank_sage.idXML
Merging peptide ids.
Merging protein ids.
Error: Unexpected internal error (the value 'LALLFYILTFVGAIFNGLTLLILGVIGLFTIPLLYR' was used but is not valid; Can't calculate mass-to-charge ratio for charge=0.)
Work dir:
/hps/nobackup/juan/pride/reanalysis/single-cell/PXD016921/work/f6/d7eb6fe239a22d425ad2597e861217
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
hmm the peptide is written as charge 1 by sage.
SpecId Label ScanNr ExpMass CalcMass FileName retentiontime rank z=2 z=3 z=4 z=5 z=6 z=other peptide_len missed_cleavages isotope_error ln(precursor_ppm) fragment_ppm ln(hyperscore) ln(delta_next) ln(delta_best) aligned_rt predicted_rt sqrt(delta_rt_model) matched_peaks longest_b longest_y longest_y_pct ln(matched_intensity_pct) scored_candidates ln(-poisson) Peptide Proteins
273895 -1 20819 4838.7183 4838.7266 20150820_Haura-Pilot-TMT1-bRPLC11-1.mzML 64.92257 1 0 0 0 0 0 1 40 1 0.0 0.99897254 22.651972 4.141100637487535 2.562488632470743 0.0 64.92257 0.0 0.99949986 22 3 5 0.125 3.0516565 259 2.720636655734263 [+229.16293]-MLLQQLDALELASEGSLK[+229.16293]PVVLSSDPQLLFKGQFPQLSLK DECOY_sp|Q92766|RREB1_HUMAN
For me it looks like it's annotated as "other" and SageAdapter makes a 0 out of it: https://github.com/OpenMS/OpenMS/blob/5e4854c9212847baf683981fe114f08f3534041f/src/openms/source/FORMAT/PercolatorInfile.cpp#L117
Is z=other documented? If not we can ask on sage's GitHub?
Interesting thing is that afaik the hardcoded charge range in sage is 2-5 in our version. How can there be 1 or 6?
Testing now.
Still the error remains
Where can we see the updated logs?
@jpfeuffer @timosachsenberg this error now is in CONSENSUSID:
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:ID:CONSENSUSID (20150820_Haura-Pilot-TMT1-bRPLC01-2)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:TMT:ID:CONSENSUSID (20150820_Haura-Pilot-TMT1-bRPLC01-2)` terminated with an error exit status (8)
Command executed:
ConsensusID \
-in 20150820_Haura-Pilot-TMT1-bRPLC01-2_comet_feat_perc.idXML 20150820_Haura-Pilot-TMT1-bRPLC01-2_msgf_feat_perc.idXML 20150820_Haura-Pilot-TMT1-bRPLC01-2_sage_perc.idXML \
-out 20150820_Haura-Pilot-TMT1-bRPLC01-2_consensus.idXML \
-per_spectrum \
-threads 1 \
-algorithm best \
-filter:min_support 0 \
-filter:considered_hits 0 \
-debug 0 \
\
2>&1 | tee null_consensusID.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:TMT:ID:CONSENSUSID":
ConsensusID: $(ConsensusID 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
END_VERSIONS
Command exit status:
8
Command output:
Error: Unexpected internal error (the value '1' was used but is not valid; Conflicting charge states found for peptide 'VTGISGATAENVPVEQIVENNDIIILTPQILVNNLK(TMT6plex)': 7, 1)
Command wrapper:
Error: Unexpected internal error (the value '1' was used but is not valid; Conflicting charge states found for peptide 'VTGISGATAENVPVEQIVENNDIIILTPQILVNNLK(TMT6plex)': 7, 1)
Work dir:
/hps/nobackup/juan/pride/reanalysis/differential-expression/tmt/PXD004683/work/3d/cd8e3f8ef03b30f3e01203b8ce664c
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
Ok this error is triggered in ConsensusID now: https://github.com/OpenMS/OpenMS/blob/f958a6bc954de76f373ea5bf659c04c11b0fce38/src/openms/source/ANALYSIS/ID/ConsensusIDAlgorithm.cpp#L135
Ah ok, I think the reason is that the spectrum is annotated as charge 7 (!) and sage sets the "z=other" column to 1 (see https://github.com/lazear/sage/blob/675d507bea942a792a87408ddf4f8c38195b59cf/crates/sage-cli/src/output.rs#L196C27-L200 ). I misinterpreted how this is handled by sage and thought that the z=other column will get the value 7 in this case.
Because I can't reconstruct the proper charge from the pin file I think the best approach currently would be to just filter those PSMs.
@lazear would it make sense to encode charges outside the expected range in the "z=other" column instead of using a one-hot encoding? That way I could set a proper charge for the hit without going back to the spectra. For the SVM this should be no problem.
That is correct, and I can encode the charges outside of the range as an integer value (e.g. 2 or 7) instead of one-hot-encoding them.
Are you guys actually running percolator directly off of the .pin files? Could also pull the same features (not all log-transformed though), like the actual integer value of charge, from the results.sage.tsv
file - up to you if that's an easier way.
For consistency reasons, we currently only read the pin file. Reason is that this way we have full control (e.g., we can still annotate some metadata needed to do consensus identification etc.).
Gotcha - I will update the pin file
This is solved.
Description of the bug
Command used and terminal output
No response
Relevant files
No response
System information
No response