Open wsnoble opened 7 years ago
The first place to start looking would be in (all of these files are in the src directory) app/PredictPeptideIons.cpp, which is the code for the predict-peptide-ions command. On line 74, we can see that a modified sequence will immediately cause an error because valid_peptide_sequence (util/crux-utils.cpp:1016) only checks that each character in a string is an uppercase letter. This would be the first place to fix.
Looking at the rest of the code for predict-peptide-ions, it seems like most of the work is done by an IonSeries object (model/IonSeries.cpp). I would start with the constructor (model/IonSeries.cpp:124) since the modified sequence is getting passed in here. convert_to_mod_aa_seq is getting called with the sequence, and it can handle modified sequences so it's okay. But on line 138 peptidelength is getting set to peptide_.length(), i.e. the length of the modified sequence string, which will be a mistake. Rather, it should be set to the return value of convert_to_mod_aa_seq since that function should return the correct peptide length.
I think after those 2 fixes, we should check some example results and see if they are correct (I am not at all familiar with the IonConstraint/IonSeries code, so there may more changes needed).
I tried to fix these two problems.
For the first fix I just let pass more symbols.
The code is changed at the util/crux-utils.cpp:1016
if (!isupper(*i)&&!isdigit(*i)&& *i!='.'&& *i!='['&& *i!=']'&& *i!='-'){
For the second problem in model/IonSeries.cpp, I looked into the convert_to_mod_aa_seq function in model/modification.cpp between line 472 and line 552. It seems this function is able to convert sequence string with bracket mass to mod_sequence.
int convert_to_mod_aa_seq(const string& sequence,
MODIFIED_AA_T** mod_sequence, MASS_FORMAT_T mass_format){
…
*mod_sequence = new_sequence;
return mod_idx;
}
So then I changed line 138 in model/IonSeries.cpp to the length of returned value. But I am not so sure about the MODIFIED_AA_T
type.
peptide_length_ = (*modified_aa_seq_).length();
I tried to test this function but I can't since I don’t know how to step into to successfully compile with intertwined classes and functions.
Hi Fangfei,
That is almost right. MODIFIED_AA_T is defined in model/objects.h at line 646:
typedef uint16_t MODIFIED_AA_T;
Meaning it is just a 16 bit unsigned integer, representing an amino acid that is potentially modified. It doesn't have a length()
method.
The convert_to_mod_aa_seq
function is passed in mod_sequence
, a pointer to a pointer of MODIFIED_AA_T. The function sets the value pointed at by mod_sequence to an array of MODIFIED_AA_T, representing the modified sequence. But the actual return value of the function is an int, which is the length of the sequence.
So, at IonSeries.cpp:134 you should store the return value of convert_to_mod_aa_seq
as the peptide length:
peptide_length_ = convert_to_mod_aa_seq(peptide_, &modified_aa_seq_);
(and then delete line 138)
Dear Bill and Kaipo,
Thanks for your help. The predict-peptide-ions
seems running now.
Bill told me to test the modified code(changes in util/crux-utils.cpp; model/IonSeries.cpp;) by configuring the code.
Below are the commands for building:
$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH=$HOME/crux .
$ make
$ make install
The error occurs in make
initially:
-- Built ProteoWizard.
[ 4%] Performing install step for 'ProteoWizard'
/bin/sh: 1: /home/fangfeiz/Software/crux-toolkit-master/ext/create_links.sh: Permission denied
ext/CMakeFiles/ProteoWizard.dir/build.make:73: recipe for target 'ext/build/src/ProteoWizard-stamp/ProteoWizard-install' failed
make[2]: *** [ext/build/src/ProteoWizard-stamp/ProteoWizard-install] Error 126
CMakeFiles/Makefile2:424: recipe for target 'ext/CMakeFiles/ProteoWizard.dir/all' failed
make[1]: *** [ext/CMakeFiles/ProteoWizard.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2
Then I chmod +x ext/*.sh;
the error is solved and crux is successfully built.
For testing, I tried K[42.01]R 1, it works on implementation level.
$ ./crux predict-peptide-ions K[42.01]R 1
INFO: Beginning predict-peptide-ions.
WARNING: No modification identifier found for mass shift 42.01.
Warning Suppressed, others may exist
INFO: Creating modification for 42.009998
# PEPTIDE: K[42.01]R
# AVERAGE: 476.423187 MONO:907.944397
# CHARGE: 1
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z mass charge ion-series peptide-bond-index nh3 h2o isotope flank
171.11 171.11 1 b 1 0 0 0 0
175.12 175.12 1 y 1 0 0 0 0
INFO: Elapsed time: 0.000224 s
INFO: Finished crux predict-peptide-ions.
INFO: Return Code:0
INFO: Ion cache check in: 2 check out:2
K[-17.00]R: Negative mass shift also works.
$ ./crux predict-peptide-ions K[-17.00]R 1
INFO: Beginning predict-peptide-ions.
WARNING: No modification identifier found for mass shift -17.00.
Warning Suppressed, others may exist
INFO: Creating modification for -17.000000
# PEPTIDE: K[-17.00]R
# AVERAGE: 476.423187 MONO:979.023254
# CHARGE: 1
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z mass charge ion-series peptide-bond-index nh3 h2o isotope flank
112.10 112.10 1 b 1 0 0 0 0
175.12 175.12 1 y 1 0 0 0 0
INFO: Elapsed time: 0.000398 s
INFO: Finished crux predict-peptide-ions.
INFO: Return Code:0
INFO: Ion cache check in: 2 check out:2
However, there is still aWARNING: No modification identifier found for mass shift 42.01.
42.01 should be monoisotopic mass shift of Acetylation according to Unimod. I am wondering whether we also need modification identifier for similarity calculation. If so, where are the "modification identifiers" located.
Best, Fangfei
The warning in this case doesn't necessarily mean that anything is wrong - it just means that Crux hasn't seen that particular modification yet. If your changes are ready for a code review, you can create a patch file and attach it here.
FZ20170613_PredictPeptideIon.diff.txt
Attached .txt is the patch file. I also add the corresponding cucumber smoke-test to it.
The smoke tests are not passed in the below scenarios. They are also not passed in the original code.
Failing Scenarios:
cucumber features/psm-convert.feature:35 # Scenario Outline: User runs psm-convert, Examples (#7)
cucumber features/psm-convert.feature:36 # Scenario Outline: User runs psm-convert, Examples (#8)
cucumber features/search-for-xlinks.feature:17 # Scenario Outline: User runs search-for-xlinks, Examples (#2)
cucumber features/search-for-xlinks.feature:19 # Scenario Outline: User runs search-for-xlinks, Examples (#3)
120 scenarios (4 failed, 116 passed)
943 steps (4 failed, 939 passed)
6m12.728s
good_results/psmconv-from-txt1.pep.xml.observed --> good_results/psmconv-from-txt1.pep.xml
good_results/psmconv-from-txt2.pep.xml.observed --> good_results/psmconv-from-txt2.pep.xml
good_results/search-for-xlinks.cz.txt.observed --> good_results/search-for-xlinks.cz.txt
good_results/search-xlink.target.txt.observed --> good_results/search-xlink.target.txt
Hmm, they are passing on my machine. Could you try a diff between the observed and expected and see what's different between them?
What operating system are you running the tests on? If you run the full sets of tests on an unmodified version of Crux, how many pass, and which ones fail?
My OS is Ubuntu 16.04.2 LTS. I re-ran the smoke test, now one set failed. Attached are the diff files.
diff psmconv-from-txt1.pep.xml psmconv-from-txt1.pep.xml.observed >~/psmconv-from-txt1.pep.xml.diff.txt diff psmconv-from-txt2.pep.xml psmconv-from-txt2.pep.xml.observed >~/psmconv-from-txt2.pep.xml.diff.txt
psmconv-from-txt2.pep.xml.diff.txt
psmconv-from-txt1.pep.xml.diff.txt
Some strange characters are inside.
I showed these errors to Bill today, he said it could be OS dependent problem. But I can already start to use predict peptide ion in crux for analysis.
What is the status of this patch? Kaipo, do the smoke tests pass for you, using Fangfei's changes?
Fangfei, I am getting a failure with the new test you've added. standard_predict_ions-mod.out Expected output
# PEPTIDE: IAMAS[42.01]EQ
# AVERAGE: 922.895569 MONO:1354.080566
# CHARGE: 2
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z mass charge ion-series peptide-bond-index nh3 h2o isotope flank
114.09 114.09 1 b 1 0 0 0 0
147.08 147.08 1 y 1 0 0 0 0
185.13 185.13 1 b 2 0 0 0 0
276.12 276.12 1 y 2 0 0 0 0
316.17 316.17 1 b 3 0 0 0 0
405.16 405.16 1 y 3 0 0 0 0
387.21 387.21 1 b 4 0 0 0 0
476.20 476.20 1 y 4 0 0 0 0
516.25 516.25 1 b 5 0 0 0 0
607.24 607.24 1 y 5 0 0 0 0
645.29 645.29 1 b 6 0 0 0 0
678.28 678.28 1 y 6 0 0 0 0
Actual output
# PEPTIDE: IAMAS[42.01]EQ
# AVERAGE: 1090.935547 MONO:1354.080566
# CHARGE: 2
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z mass charge ion-series peptide-bond-index nh3 h2o isotope flank
114.09 114.09 1 b 1 0 0 0 0
147.08 147.08 1 y 1 0 0 0 0
185.13 185.13 1 b 2 0 0 0 0
276.12 276.12 1 y 2 0 0 0 0
316.17 316.17 1 b 3 0 0 0 0
405.16 405.16 1 y 3 0 0 0 0
387.21 387.21 1 b 4 0 0 0 0
476.20 476.20 1 y 4 0 0 0 0
516.25 516.25 1 b 5 0 0 0 0
607.24 607.24 1 y 5 0 0 0 0
645.29 645.29 1 b 6 0 0 0 0
678.28 678.28 1 y 6 0 0 0 0
I checked that their fragment b/y ions are matched and correct. On my crux build the AVERAGE is still 922.895569, but both AVERAGE
and MONO
values are incorrect for bracketed mass. It's because the current app/PredictPeptideIons.cpp
(line 116:117) calls Peptide::calcSequenceMass
to calculate mass which cannot deal with brackets.
Hi Fangfei, is this something you can easily fix?
Kaipo, can you give advice about how to proceed here? Is there an existing alternative to calcSequenceMass that does take into account brackets, or should Fangfei write one?
Bill
On Tue, Jun 27, 2017 at 2:06 AM, Fangfei Zhang notifications@github.com wrote:
I checked that their fragment b/y ions are matched and correct. On my crux build the AVERAGE is still 922.895569, but both AVERAGE and MONO values are incorrect for bracketed mass. It's because the current app/PredictPeptideIons.cpp(line 116:117) calls Peptide::calcSequenceMass to calculate mass which cannot deal with brackets.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crux-toolkit/crux-toolkit/issues/389#issuecomment-311215446, or mute the thread https://github.com/notifications/unsubscribe-auth/AFieqNGifl2Eo1afB9pf0sngc_nenmBxks5sIEeUgaJpZM4N1Gl1 .
I actually just recently added a method that I think will work here, it is defined at src/model/Modification.cpp with the signature void Modification::FromSeq(const string& seq, string* outSeq, vector<Modification>* outMods)
. It takes a sequence as a string and returns the unmodified sequence and a vector of the modifications.
for example:
string unmodifiedSequence;
vector<Modification> mods;
Modification::FromSeq(peptide_sequence, &unmodifiedSequence, &mods);
Then we can call Peptide::calcSequenceMass
on the unmodified sequence, and loop over the vector of modifications and sum the masses and add it, something like this:
FLOAT_T averageMass = Crux::Peptide::calcSequenceMass(unmodifiedSequence, AVERAGE);
FLOAT_T monoMass = Crux::Peptide::calcSequenceMass(unmodifiedSequence, MONO);
for (vector<Modification>::const_iterator i = mods.begin(); i != mods.end(); i++) {
averageMass += i->DeltaMass();
monoMass += i->DeltaMass();
}
It would be nice if the user could specify modifications (in brackets) when the peptide is specified on the command line.
Kaipo, can you comment on how difficult this would be to do?