compomics / ms2pip

MS²PIP: Fast and accurate peptide spectrum prediction for multiple fragmentation methods, instruments, and labeling techniques.
https://ms2pip.readthedocs.io
Apache License 2.0
37 stars 18 forks source link

Negative predicted intensities? #31

Closed wyu closed 5 years ago

wyu commented 5 years ago

I'm wonder if anyone is getting the negative values on the output?

python3 ms2pipC.py -c TMT.cfg test.PEPREC

output:

spec_id,charge,ion,ionnumber,mz,prediction test:3276:1:VNHVTLSQPK,3,B,1,329.2386,-7.034779 test:3276:1:VNHVTLSQPK,3,B,2,443.28152,-5.928182 test:3276:1:VNHVTLSQPK,3,B,3,580.34045,-5.3165855 test:3276:1:VNHVTLSQPK,3,B,4,679.4089,-7.5345697 test:3276:1:VNHVTLSQPK,3,B,5,780.45654,-7.809765 test:3276:1:VNHVTLSQPK,3,B,6,893.5406,-8.820357 test:3276:1:VNHVTLSQPK,3,B,7,980.57263,-9.817887 test:3276:1:VNHVTLSQPK,3,B,8,1108.6311,-9.670811 test:3276:1:VNHVTLSQPK,3,B,9,1205.6838,-9.889491 test:3276:1:VNHVTLSQPK,3,Y,1,376.27567,-6.188411 test:3276:1:VNHVTLSQPK,3,Y,2,473.32843,-4.9410367 test:3276:1:VNHVTLSQPK,3,Y,3,601.387,-8.501914 test:3276:1:VNHVTLSQPK,3,Y,4,688.41907,-6.054326 test:3276:1:VNHVTLSQPK,3,Y,5,801.5031,-8.502624 test:3276:1:VNHVTLSQPK,3,Y,6,902.5508,-7.804158 test:3276:1:VNHVTLSQPK,3,Y,7,1001.6192,-10.036649 test:3276:1:VNHVTLSQPK,3,Y,8,1138.6781,-10.053709 test:3276:1:VNHVTLSQPK,3,Y,9,1252.7211,-10.230613

test.PEPREC spec_id modifications peptide charge test:3276:1:VNHVTLSQPK 0|TMT10|10|TMT10K VNHVTLSQPK 3

TMT.cfg:

am,57.02146,fix,C ptm=Pyro_glu,-18.010565,opt,E ptm=Pyro-glu,-17.026549,opt,Q ptm=Pyro-cmC,39.994915,opt,C ptm=PhosphoS,79.966331,opt,S ptm=PhosphoT,79.966331,opt,T ptm=PhosphoY,79.966331,opt,Y ptm=TMT10K,229.162932,fix,K

nterm=TMT10,229.162932,fix,N-term

ptm=TMT10,229.162932,fix,N-term ptm=Deamidated,0.984016,opt,N # ptm=TMT10K,229.162932,opt,K ptm=TMT10,229.162932,opt,N-term ptm=Cam,57.02146,opt,C

wyu commented 5 years ago

Just did a fresh install on a Redhat 6 Linux server and got the negative predicted intensities as well on the test.PEPREC as well.

Am I missing something?

Wen

more test_HCD_predictions.csv

spec_id,charge,ion,ionnumber,mz,prediction peptide3,2,B,1,72.04435,-10.122614 peptide3,2,B,2,175.05354,-5.287869 peptide3,2,B,3,290.08047,-5.796749

RalfG commented 5 years ago

Hi! Yes, raw MS²PIP predictions are in log2 space (this is what the machine learning model is trained on). So to 'unlog' the data, you can just do "(2 ** prediction) - 0.001" on all predictions (we add the 0.001 to prevent division by 0 errors). Optionally, and depending on your use case, you can also perform TIC- or base peak normalization. I will add this to the documentation.