crux-toolkit / crux-toolkit

http://crux.ms/
Other
32 stars 26 forks source link

predict-peptide-ions does not allow modifications #389

Open wsnoble opened 7 years ago

wsnoble commented 7 years ago

It would be nice if the user could specify modifications (in brackets) when the peptide is specified on the command line.

Kaipo, can you comment on how difficult this would be to do?

zhangfa7 commented 7 years ago

Kaipo:

The first place to start looking would be in (all of these files are in the src directory) app/PredictPeptideIons.cpp, which is the code for the predict-peptide-ions command. On line 74, we can see that a modified sequence will immediately cause an error because valid_peptide_sequence (util/crux-utils.cpp:1016) only checks that each character in a string is an uppercase letter. This would be the first place to fix.

Looking at the rest of the code for predict-peptide-ions, it seems like most of the work is done by an IonSeries object (model/IonSeries.cpp). I would start with the constructor (model/IonSeries.cpp:124) since the modified sequence is getting passed in here. convert_to_mod_aa_seq is getting called with the sequence, and it can handle modified sequences so it's okay. But on line 138 peptidelength is getting set to peptide_.length(), i.e. the length of the modified sequence string, which will be a mistake. Rather, it should be set to the return value of convert_to_mod_aa_seq since that function should return the correct peptide length.

I think after those 2 fixes, we should check some example results and see if they are correct (I am not at all familiar with the IonConstraint/IonSeries code, so there may more changes needed).

Fangfei:

I tried to fix these two problems. For the first fix I just let pass more symbols. The code is changed at the util/crux-utils.cpp:1016 if (!isupper(*i)&&!isdigit(*i)&& *i!='.'&& *i!='['&& *i!=']'&& *i!='-'){

For the second problem in model/IonSeries.cpp, I looked into the convert_to_mod_aa_seq function in model/modification.cpp between line 472 and line 552. It seems this function is able to convert sequence string with bracket mass to mod_sequence.

int convert_to_mod_aa_seq(const string& sequence, 
                          MODIFIED_AA_T** mod_sequence,  MASS_FORMAT_T mass_format){
…
  *mod_sequence = new_sequence;
   return mod_idx;
}

So then I changed line 138 in model/IonSeries.cpp to the length of returned value. But I am not so sure about the MODIFIED_AA_Ttype. peptide_length_ = (*modified_aa_seq_).length();

I tried to test this function but I can't since I don’t know how to step into to successfully compile with intertwined classes and functions.

kaipot commented 7 years ago

Hi Fangfei,

That is almost right. MODIFIED_AA_T is defined in model/objects.h at line 646: typedef uint16_t MODIFIED_AA_T; Meaning it is just a 16 bit unsigned integer, representing an amino acid that is potentially modified. It doesn't have a length() method.

The convert_to_mod_aa_seq function is passed in mod_sequence, a pointer to a pointer of MODIFIED_AA_T. The function sets the value pointed at by mod_sequence to an array of MODIFIED_AA_T, representing the modified sequence. But the actual return value of the function is an int, which is the length of the sequence.

So, at IonSeries.cpp:134 you should store the return value of convert_to_mod_aa_seq as the peptide length: peptide_length_ = convert_to_mod_aa_seq(peptide_, &modified_aa_seq_); (and then delete line 138)

zhangfa7 commented 7 years ago

Dear Bill and Kaipo,

Thanks for your help. The predict-peptide-ions seems running now.

Bill told me to test the modified code(changes in util/crux-utils.cpp; model/IonSeries.cpp;) by configuring the code.

Below are the commands for building:

$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH=$HOME/crux .
$ make
$ make install

The error occurs in make initially:

-- Built ProteoWizard.
[  4%] Performing install step for 'ProteoWizard'
/bin/sh: 1: /home/fangfeiz/Software/crux-toolkit-master/ext/create_links.sh: Permission denied
ext/CMakeFiles/ProteoWizard.dir/build.make:73: recipe for target 'ext/build/src/ProteoWizard-stamp/ProteoWizard-install' failed
make[2]: *** [ext/build/src/ProteoWizard-stamp/ProteoWizard-install] Error 126
CMakeFiles/Makefile2:424: recipe for target 'ext/CMakeFiles/ProteoWizard.dir/all' failed
make[1]: *** [ext/CMakeFiles/ProteoWizard.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

Then I chmod +x ext/*.sh;the error is solved and crux is successfully built.

For testing, I tried K[42.01]R 1, it works on implementation level.

$ ./crux predict-peptide-ions  K[42.01]R 1
INFO: Beginning predict-peptide-ions.
WARNING: No modification identifier found for mass shift 42.01.
Warning Suppressed, others may exist
INFO: Creating modification for 42.009998
# PEPTIDE: K[42.01]R
# AVERAGE: 476.423187 MONO:907.944397
# CHARGE: 1
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z     mass    charge  ion-series      peptide-bond-index      nh3     h2o     isotope flank
171.11  171.11  1       b       1       0       0       0       0
175.12  175.12  1       y       1       0       0       0       0
INFO: Elapsed time: 0.000224 s
INFO: Finished crux predict-peptide-ions.
INFO: Return Code:0
INFO: Ion cache check in: 2 check out:2

K[-17.00]R: Negative mass shift also works.

$ ./crux predict-peptide-ions  K[-17.00]R 1
INFO: Beginning predict-peptide-ions.
WARNING: No modification identifier found for mass shift -17.00.
Warning Suppressed, others may exist
INFO: Creating modification for -17.000000
# PEPTIDE: K[-17.00]R
# AVERAGE: 476.423187 MONO:979.023254
# CHARGE: 1
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z     mass    charge  ion-series      peptide-bond-index      nh3     h2o     isotope flank
112.10  112.10  1       b       1       0       0       0       0
175.12  175.12  1       y       1       0       0       0       0
INFO: Elapsed time: 0.000398 s
INFO: Finished crux predict-peptide-ions.
INFO: Return Code:0
INFO: Ion cache check in: 2 check out:2

However, there is still aWARNING: No modification identifier found for mass shift 42.01. 42.01 should be monoisotopic mass shift of Acetylation according to Unimod. I am wondering whether we also need modification identifier for similarity calculation. If so, where are the "modification identifiers" located.

Best, Fangfei

kaipot commented 7 years ago

The warning in this case doesn't necessarily mean that anything is wrong - it just means that Crux hasn't seen that particular modification yet. If your changes are ready for a code review, you can create a patch file and attach it here.

zhangfa7 commented 7 years ago

FZ20170613_PredictPeptideIon.diff.txt

Attached .txt is the patch file. I also add the corresponding cucumber smoke-test to it.

The smoke tests are not passed in the below scenarios. They are also not passed in the original code.

Failing Scenarios:
cucumber features/psm-convert.feature:35 # Scenario Outline: User runs psm-convert, Examples (#7)
cucumber features/psm-convert.feature:36 # Scenario Outline: User runs psm-convert, Examples (#8)
cucumber features/search-for-xlinks.feature:17 # Scenario Outline: User runs search-for-xlinks, Examples (#2)
cucumber features/search-for-xlinks.feature:19 # Scenario Outline: User runs search-for-xlinks, Examples (#3)

120 scenarios (4 failed, 116 passed)
943 steps (4 failed, 939 passed)
6m12.728s

good_results/psmconv-from-txt1.pep.xml.observed --> good_results/psmconv-from-txt1.pep.xml
good_results/psmconv-from-txt2.pep.xml.observed --> good_results/psmconv-from-txt2.pep.xml
good_results/search-for-xlinks.cz.txt.observed --> good_results/search-for-xlinks.cz.txt
good_results/search-xlink.target.txt.observed --> good_results/search-xlink.target.txt
kaipot commented 7 years ago

Hmm, they are passing on my machine. Could you try a diff between the observed and expected and see what's different between them?

wsnoble commented 7 years ago

What operating system are you running the tests on? If you run the full sets of tests on an unmodified version of Crux, how many pass, and which ones fail?

zhangfa7 commented 7 years ago

My OS is Ubuntu 16.04.2 LTS. I re-ran the smoke test, now one set failed. Attached are the diff files. diff psmconv-from-txt1.pep.xml psmconv-from-txt1.pep.xml.observed >~/psmconv-from-txt1.pep.xml.diff.txt diff psmconv-from-txt2.pep.xml psmconv-from-txt2.pep.xml.observed >~/psmconv-from-txt2.pep.xml.diff.txt psmconv-from-txt2.pep.xml.diff.txt psmconv-from-txt1.pep.xml.diff.txt Some strange characters are inside.

I showed these errors to Bill today, he said it could be OS dependent problem. But I can already start to use predict peptide ion in crux for analysis.

wsnoble commented 7 years ago

What is the status of this patch? Kaipo, do the smoke tests pass for you, using Fangfei's changes?

kaipot commented 7 years ago

Fangfei, I am getting a failure with the new test you've added. standard_predict_ions-mod.out Expected output

# PEPTIDE: IAMAS[42.01]EQ
# AVERAGE: 922.895569 MONO:1354.080566
# CHARGE: 2
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z mass  charge  ion-series  peptide-bond-index  nh3 h2o isotope flank
114.09  114.09  1 b 1 0 0 0 0
147.08  147.08  1 y 1 0 0 0 0
185.13  185.13  1 b 2 0 0 0 0
276.12  276.12  1 y 2 0 0 0 0
316.17  316.17  1 b 3 0 0 0 0
405.16  405.16  1 y 3 0 0 0 0
387.21  387.21  1 b 4 0 0 0 0
476.20  476.20  1 y 4 0 0 0 0
516.25  516.25  1 b 5 0 0 0 0
607.24  607.24  1 y 5 0 0 0 0
645.29  645.29  1 b 6 0 0 0 0
678.28  678.28  1 y 6 0 0 0 0

Actual output

# PEPTIDE: IAMAS[42.01]EQ
# AVERAGE: 1090.935547 MONO:1354.080566
# CHARGE: 2
# MAX-ION-CHARGE: peptide
# NH3 modification: 0
# H2O modification: 0
# ISOTOPE modification: 0
# FLANK modification: 0
m/z mass  charge  ion-series  peptide-bond-index  nh3 h2o isotope flank
114.09  114.09  1 b 1 0 0 0 0
147.08  147.08  1 y 1 0 0 0 0
185.13  185.13  1 b 2 0 0 0 0
276.12  276.12  1 y 2 0 0 0 0
316.17  316.17  1 b 3 0 0 0 0
405.16  405.16  1 y 3 0 0 0 0
387.21  387.21  1 b 4 0 0 0 0
476.20  476.20  1 y 4 0 0 0 0
516.25  516.25  1 b 5 0 0 0 0
607.24  607.24  1 y 5 0 0 0 0
645.29  645.29  1 b 6 0 0 0 0
678.28  678.28  1 y 6 0 0 0 0
zhangfa7 commented 7 years ago

I checked that their fragment b/y ions are matched and correct. On my crux build the AVERAGE is still 922.895569, but both AVERAGE and MONO values are incorrect for bracketed mass. It's because the current app/PredictPeptideIons.cpp(line 116:117) calls Peptide::calcSequenceMass to calculate mass which cannot deal with brackets.

thabangh commented 7 years ago

Hi Fangfei, is this something you can easily fix?

Kaipo, can you give advice about how to proceed here? Is there an existing alternative to calcSequenceMass that does take into account brackets, or should Fangfei write one?

Bill

On Tue, Jun 27, 2017 at 2:06 AM, Fangfei Zhang notifications@github.com wrote:

I checked that their fragment b/y ions are matched and correct. On my crux build the AVERAGE is still 922.895569, but both AVERAGE and MONO values are incorrect for bracketed mass. It's because the current app/PredictPeptideIons.cpp(line 116:117) calls Peptide::calcSequenceMass to calculate mass which cannot deal with brackets.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crux-toolkit/crux-toolkit/issues/389#issuecomment-311215446, or mute the thread https://github.com/notifications/unsubscribe-auth/AFieqNGifl2Eo1afB9pf0sngc_nenmBxks5sIEeUgaJpZM4N1Gl1 .

kaipot commented 7 years ago

I actually just recently added a method that I think will work here, it is defined at src/model/Modification.cpp with the signature void Modification::FromSeq(const string& seq, string* outSeq, vector<Modification>* outMods). It takes a sequence as a string and returns the unmodified sequence and a vector of the modifications. for example:

string unmodifiedSequence;
vector<Modification> mods;
Modification::FromSeq(peptide_sequence, &unmodifiedSequence, &mods);

Then we can call Peptide::calcSequenceMass on the unmodified sequence, and loop over the vector of modifications and sum the masses and add it, something like this:

FLOAT_T averageMass = Crux::Peptide::calcSequenceMass(unmodifiedSequence, AVERAGE);
FLOAT_T monoMass = Crux::Peptide::calcSequenceMass(unmodifiedSequence, MONO);
for (vector<Modification>::const_iterator i = mods.begin(); i != mods.end(); i++) {
  averageMass += i->DeltaMass();
  monoMass += i->DeltaMass();
}