PyProphet / pyprophet

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.
http://www.openswath.org
BSD 3-Clause "New" or "Revised" License
29 stars 21 forks source link

nearly identical pep/qvalue distribution for target and decoy but with bimodal distribution on score #99

Closed kairenchen721 closed 2 years ago

kairenchen721 commented 2 years ago

Thank you for making pyprophet, I am really excited to use it for my project

I use the docker option described on the openswath docs, and this is the command I ran

pyprophet peptide --in 20181201_FlMe_SA_diaPASEF_200ng_HeLa_py3.osw

qvalue pvalue svalue pep ... tn fp fn cutoff 0 0.00 0.000011 0.504993 0.000019 ... 23541.547695 0.270487 35210.452305 3.927691 1 0.01 0.027242 0.893162 0.162110 ... 22900.493991 641.324190 7599.506009 1.972629 2 0.02 0.057287 0.928787 0.286734 ... 22193.171023 1348.647159 5065.828977 1.613684 3 0.05 0.154524 0.971287 0.576165 ... 19904.041279 3637.776903 2045.958721 1.056899 4 0.10 0.334348 0.995971 0.843348 ... 15670.652454 7871.165727 292.347546 0.489151 5 0.20 0.758247 1.000000 1.000000 ... 5691.312649 17850.505533 -271.312649 -0.643834 6 0.30 0.999989 1.000000 1.000000 ... 0.270487 23541.547695 -0.270487 -3.230400 7 0.40 NaN NaN NaN ... NaN NaN NaN NaN 8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

================================================================================ qvalue pvalue svalue pep ... tn fp fn cutoff 0 0.00 0.000011 0.504993 0.000019 ... 23541.547695 0.270487 35210.452305 3.927691 1 0.01 0.027242 0.893162 0.162110 ... 22900.493991 641.324190 7599.506009 1.972629 2 0.02 0.057287 0.928787 0.286734 ... 22193.171023 1348.647159 5065.828977 1.613684 3 0.05 0.154524 0.971287 0.576165 ... 19904.041279 3637.776903 2045.958721 1.056899 4 0.10 0.334348 0.995971 0.843348 ... 15670.652454 7871.165727 292.347546 0.489151 5 0.20 0.758247 1.000000 1.000000 ... 5691.312649 17850.505533 -271.312649 -0.643834 6 0.30 0.999989 1.000000 1.000000 ... 0.270487 23541.547695 -0.270487 -3.230400 7 0.40 NaN NaN NaN ... NaN NaN NaN NaN 8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

================================================================================ qvalue pvalue svalue pep ... tn fp fn cutoff 0 0.00 0.000012 0.554719 0.000014 ... 19312.498433 0.228840 33194.501567 3.779354 1 0.01 0.034647 0.888135 0.157204 ... 18643.598963 669.128309 8339.401037 1.885508 2 0.02 0.072979 0.926413 0.278727 ... 17903.301398 1409.425875 5485.698602 1.511289 3 0.05 0.197052 0.969980 0.556820 ... 15507.117223 3805.610050 2237.882777 0.916645 4 0.10 0.428147 0.998476 0.816992 ... 11044.049706 8268.677566 118.950294 0.252429 5 0.20 0.966088 1.000000 1.000000 ... 654.940226 18657.787046 -87.940226 -1.794583 6 0.30 0.999929 1.000000 1.000000 ... 1.373040 19311.354232 -1.373040 -3.144660 7 0.40 NaN NaN NaN ... NaN NaN NaN NaN 8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

================================================================================ qvalue pvalue svalue pep ... tn fp fn cutoff 0 0.00 0.000012 0.511254 0.000019 ... 24148.811503 0.279406 34384.188497 3.915643 1 0.01 0.026391 0.896830 0.166087 ... 23511.765023 637.325886 7258.234977 1.970259 2 0.02 0.055293 0.930276 0.297998 ... 22813.807958 1335.282951 4905.192042 1.626648 3 0.05 0.148895 0.971151 0.578817 ... 20553.410579 3595.680330 2029.589421 1.078977 4 0.10 0.321613 0.993667 0.847170 ... 16382.432573 7766.658336 454.567427 0.519142 5 0.20 0.732350 1.000000 0.996909 ... 6463.507116 17685.583794 -388.507116 -0.576950 6 0.30 0.999931 1.000000 1.000000 ... 1.676438 24147.414471 -1.676438 -3.051916 7 0.40 NaN NaN NaN ... NaN NaN NaN NaN 8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

Killed

score distribution 201812

20181201_FlMe_SA_diaPASEF_200ng_HeLa_py3.osw_6514992444606274777_run-specific_peptide.pdf

qvalue distribution 201812

pep distribution 201812

I am not sure why the distribution look like this?, it is normal for pep/qvalue to look like this

Any help would be appreciated, thank you.

grosenberger commented 2 years ago

Hi, thanks for your interest in PyProphet. Could you please provide a bit more data? Ideal would be the OSW file (directly after OpenSWATH) and the complete set of parameters and commands you used in PyProphet.

kairenchen721 commented 2 years ago

Ah sorry, I forgot to mention that I was using the default value for the parameters, the command i used was just

docker pull openswath/openswath:latest
docker run --name osw_tutorial --rm -v ~/Desktop/:/data -i -t openswath/openswath:latest
pyprophet peptide --in 20181201_FlMe_SA_diaPASEF_200ng_HeLa_py3.osw

Is there an ideal way you preferred for sharing the OSW file? is google drive ok? I think drop box's free storage is below my file size

kairenchen721 commented 2 years ago

if google drive works, this is the link https://drive.google.com/file/d/1hU7MZBcGPvLMNRgs8xfG4htuNvhaW6BJ/view?usp=sharing

please let me know if it does not work for you, I will gladly use anything else

kairenchen721 commented 2 years ago

Hi sorry, I was just wondering if the google drive link is working?

singjc commented 2 years ago

Hi Kai, I can confirm that the google drive link works.

I am wondering how you extracted the data to generate those distribution plots, and I am assuming those distributions are run-specific for only 6514992444606274777 as your example output pdf has this Run_ID.

If I use the following sql query to plot the scores for peptide inference, I get the distributions below, which is what I would more or less expect these to look like. The SCORE distribution matches your first plot, but the QVALUE and PEP do not match your second and third plot. Did you extract the data in the same way, or did you threshold anything?

SQL Query

select *
from PEPTIDE
inner join SCORE_PEPTIDE on SCORE_PEPTIDE.PEPTIDE_ID = PEPTIDE.ID
WHERE SCORE_PEPTIDE.CONTEXT = 'run-specific'
AND SCORE_PEPTIDE.RUN_ID=6514992444606274777

image

kairenchen721 commented 2 years ago

Oh, I think I did not notice that there are 3 separate runs, so I did not extract only that run, thank you

kairenchen721 commented 2 years ago

Thank you so much, the actual problem was that I had a distinct in the select clause