compomics / moFF

A modest Feature Finder (moFF) to extract MS1 intensities from Thermo raw file
Apache License 2.0
33 stars 11 forks source link

low percent of shared peptides between files #10

Closed ningzhibin closed 7 years ago

ningzhibin commented 7 years ago

Hello, I successfully run through moFF over the sample data and my own data. However, I found very low percentage between files(replicates), see below the log file.
I am wondering if I did something wrong, or the sample .txt files provided are just not complete. Could you guys post some result files for control?

_Reading file: f1_folder/20080311_CPTAC6_07_6A005.txt Reading file: f1_folder/20080313_CPTAC6_07_6A005.txt Reading file: f1_folder/20080315_CPTAC6_07_6A005.txt Read input --> done Outlier Filtering is active
Number of replicates 3, Pairwise model computation ---- matching in f1_folder/20080311_CPTAC6_07_6A005.txt Matching f1_folder/20080311_CPTAC6_07_6A005.txt peptide in searching in f1_folder/20080313_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 4031 , 5711 Peptide (mass + sequence) added size 5694 Peptide (mass + sequence) )shared 30 Outlier founded 3 w.r.t 30 Size trainig shared peptide , 27 27 Mean absolute error training : 60.9602 sec Matching f1_folder/20080311_CPTAC6_07_6A005.txt peptide in searching in f1_folder/20080315_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 4031 , 5183 Peptide (mass + sequence) added size 5186 Peptide (mass + sequence) )shared 10 Outlier founded 1 w.r.t 10 Size trainig shared peptide , 9 9 Mean absolute error training : 10.0916 sec matching in f1_folder/20080313_CPTAC6_07_6A005.txt Matching f1_folder/20080313_CPTAC6_07_6A005.txt peptide in searching in f1_folder/20080311_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 5711 , 4031 Peptide (mass + sequence) added size 4011 Peptide (mass + sequence) )shared 30 Outlier founded 3 w.r.t 30 Size trainig shared peptide , 27 27 Mean absolute error training : 61.9955 sec Matching f1_folder/20080313_CPTAC6_07_6A005.txt peptide in searching in f1_folder/20080315_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 5711 , 5183 Peptide (mass + sequence) added size 5153 Peptide (mass + sequence) )shared 43 Outlier founded 4 w.r.t 43 Size trainig shared peptide , 39 39 Mean absolute error training : 18.4202 sec matching in f1_folder/20080315_CPTAC6_07_6A005.txt Matching f1_folder/20080315_CPTAC6_07_6A005.txt peptide in searching in f1_folder/20080311_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 5183 , 4031 Peptide (mass + sequence) added size 4031 Peptide (mass + sequence) )shared 10 Outlier founded 1 w.r.t 10 Size trainig shared peptide , 9 9 Mean absolute error training : 9.8872 sec Matching f1_folder/20080315_CPTAC6_07_6A005.txt peptide in searching in f1_folder/20080313_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 5183 , 5711 Peptide (mass + sequence) added size 5682 Peptide (mass + sequence) )shared 43 Outlier founded 4 w.r.t 43 Size trainig shared peptide , 39 39 Mean absolute error training : 19.5383 sec Combination of the model -------- Weighted combination Unweighted : Predict rt for the exp. in f1_folder/20080311_CPTAC6_07_6A005.txt Matching peptides found in f1_folder/20080313_CPTAC6_07_6A005.txt Matching peptides found in f1_folder/20080315_CPTAC6_07_6A005.txt Before adding f1_folder/20080311_CPTAC6_07_6A005.txt contains 4041 After MBR f1_folder/20080311_CPTAC6_07_6A005.txt contains: 10611 peptides matched features 6570 MS2 features 4041 Predict rt for the exp. in f1_folder/20080313_CPTAC6_07_6A005.txt Matching peptides found in f1_folder/20080311_CPTAC6_07_6A005.txt Matching peptides found in f1_folder/20080315_CPTAC6_07_6A005.txt Before adding f1_folder/20080313_CPTAC6_07_6A005.txt contains 5725 After MBR f1_folder/20080313_CPTAC6_07_6A005.txt contains: 9998 peptides matched features 4273 MS2 features 5725 Predict rt for the exp. in f1_folder/20080315_CPTAC6_07_6A005.txt Matching peptides found in f1_folder/20080311_CPTAC6_07_6A005.txt Matching peptides found in f1_folder/20080313_CPTAC6_07_6A005.txt -- Predicted negative RT: those peptide will be deleted Before adding f1_folder/20080315_CPTAC6_07_6A005.txt contains 5196 After MBR f1_folder/20080315_CPTAC6_076A005.txt contains: 10036 peptides matched features 4840 MS2 features 5196

Maux82 commented 7 years ago

Hi,

the log looks fine, in the next days I will check it deeper in details and I will let you know.

Did you expereince the same behaviour also on your data ?

Andrea

ningzhibin commented 7 years ago

yes, I did. do you get more shared running on the same dataset? Thanks a lot.

Maux82 commented 7 years ago

Hi ,

I think that I have found a possible but I 'd want to be sure that this could this beahaiviour .

can you run for me moFF on the attachted sample data set ? The input file are the same but I changed the name of some fields.

Do you stil get a low number of shared peptide ?

Thanks

sample_data_patchheader.zip

ningzhibin commented 7 years ago

I think this time is right, see the log I noticed that you changed "calc_mass" to "exp_mass", is this the only thing you changed?

Reading file: ./20080311_CPTAC6_07_6A005.txt Reading file: ./20080313_CPTAC6_07_6A005.txt Reading file: ./20080315_CPTAC6_07_6A005.txt Read input --> done Outlier Filtering is active
Number of replicates 3, Pairwise model computation ---- matching in ./20080311_CPTAC6_07_6A005.txt Matching ./20080311_CPTAC6_07_6A005.txt peptide in searching in ./20080313_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 3836 , 5420 Peptide (mass + sequence) added size 3592 Peptide (mass + sequence) )shared 1914 Outlier founded 133 w.r.t 1914 Size trainig shared peptide , 1781 1781 Mean absolute error training : 14.7310 sec Matching ./20080311_CPTAC6_07_6A005.txt peptide in searching in ./20080315_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 3836 , 4924 Peptide (mass + sequence) added size 3173 Peptide (mass + sequence) )shared 1812 Outlier founded 130 w.r.t 1812 Size trainig shared peptide , 1682 1682 Mean absolute error training : 13.4474 sec matching in ./20080313_CPTAC6_07_6A005.txt Matching ./20080313_CPTAC6_07_6A005.txt peptide in searching in ./20080311_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 5420 , 3836 Peptide (mass + sequence) added size 1948 Peptide (mass + sequence) )shared 1914 Outlier founded 133 w.r.t 1914 Size trainig shared peptide , 1781 1781 Mean absolute error training : 14.8355 sec Matching ./20080313_CPTAC6_07_6A005.txt peptide in searching in ./20080315_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 5420 , 4924 Peptide (mass + sequence) added size 2468 Peptide (mass + sequence) )shared 2496 Outlier founded 135 w.r.t 2496 Size trainig shared peptide , 2361 2361 Mean absolute error training : 11.6169 sec matching in ./20080315_CPTAC6_07_6A005.txt Matching ./20080315_CPTAC6_07_6A005.txt peptide in searching in ./20080311_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 4924 , 3836 Peptide (mass + sequence) added size 2057 Peptide (mass + sequence) )shared 1812 Outlier founded 130 w.r.t 1812 Size trainig shared peptide , 1682 1682 Mean absolute error training : 13.7946 sec Matching ./20080315_CPTAC6_07_6A005.txt peptide in searching in ./20080313_CPTAC6_07_6A005.txt Peptide unique (mass + sequence) 4924 , 5420 Peptide (mass + sequence) added size 2974 Peptide (mass + sequence) )shared 2496 Outlier founded 135 w.r.t 2496 Size trainig shared peptide , 2361 2361 Mean absolute error training : 11.7658 sec Combination of the model -------- Weighted combination Unweighted : Predict rt for the exp. in ./20080311_CPTAC6_07_6A005.txt Matching peptides found in ./20080313_CPTAC6_07_6A005.txt Matching peptides found in ./20080315_CPTAC6_07_6A005.txt -- Predicted negative RT: those peptide will be deleted Before adding ./20080311_CPTAC6_07_6A005.txt contains 4041 After MBR ./20080311_CPTAC6_07_6A005.txt contains: 10504 peptides matched features 6463 MS2 features 4041 Predict rt for the exp. in ./20080313_CPTAC6_07_6A005.txt Matching peptides found in ./20080311_CPTAC6_07_6A005.txt Matching peptides found in ./20080315_CPTAC6_07_6A005.txt Before adding ./20080313_CPTAC6_07_6A005.txt contains 5725 After MBR ./20080313_CPTAC6_07_6A005.txt contains: 9960 peptides matched features 4235 MS2 features 5725 Predict rt for the exp. in ./20080315_CPTAC6_07_6A005.txt Matching peptides found in ./20080311_CPTAC6_07_6A005.txt Matching peptides found in ./20080313_CPTAC6_07_6A005.txt -- Predicted negative RT: those peptide will be deleted Before adding ./20080315_CPTAC6_07_6A005.txt contains 5196 After MBR ./20080315_CPTAC6_07_6A005.txt contains: 9984 peptides matched features 4788 MS2 features 5196

Maux82 commented 7 years ago

Hi,

I will exaplain better what was the problem and how I fix it.

The mbr process is based on unique peptides found across all the input run, in order to mark the unique peptides I used a key composed by peptide sequence and the mass. This has some drawbacks because the mass must be the theoritical mass otherwise the concept of key does not work. Indeed, in the sample files the mass field refered to the experimental one (by mistake) and the mbr failed as you pointed out.

Now I used just the modified sequence as key into the mbr process, in this way I do not have check if the mass fields in the input file is the experimental or theoretical. However, the "mod_peptide" field that describe the peptides sequence and its modification now is mandatory in the input file if run the entire workflow MBR+APEX (check moff_setting.properties ). Moreover, it is more easy to check if a matched peptide is modified one or not.

I have fixed both the mater and multipr_thermo brach , so if you run moFf with the sample data now it shoud be work propely. Let me know if it is not the case.

Thanks for pointing me out this bug . Cheers Andrea