compomics / DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
https://iomics.ugent.be/deeplc
Apache License 2.0
52 stars 18 forks source link

Blank spot after calibration #10

Closed courcelm closed 4 years ago

courcelm commented 4 years ago

Hello,

I noticed that there are regions with no prediction in the predicted rt domain. You can see blank rows in this scatter plot where no value is predicted.

image

I don`t see this without calibration. image

Any idea about the source of this issue?

Thanks

RobbinBouwmeester commented 4 years ago

Hi courcel,

This wrong calibration behaviour was previously observed for some other data sets. There still seems to be a bug in the calibration function...

My apologies for that. Will have a look into this next week and hopefully resolve this :).

RobbinBouwmeester commented 4 years ago

The calibration function was updated just now and will be available in version 0.1.15.

Hope that fixes the problem, please let me know if it does :).

courcelm commented 4 years ago

Thanks, I will try it once it will be available. Will you release a docker image at the same time?

courcelm commented 4 years ago

@RobbinBouwmeester

I just tried version 0.1.15

On some dataset I get this error:

1292/1292 [==============================] - 2s 1ms/sample ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.2718189239501953 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.2718189239501953,0.5436378479003906 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.5436378479003906,0.8154567718505858 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.8154567718505858,1.0872756958007812 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.0872756958007812,1.3590946197509766 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.3590946197509766,1.630913543701172 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.6309135437011717,1.902732467651367 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.902732467651367,2.1745513916015624 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.1745513916015624,2.4463703155517575 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.4463703155517575,2.7181892395019527 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.718189239501953,2.9900081634521483 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.9900081634521483,3.2618270874023434 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 3.2618270874023434,3.5336460113525385 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 3.533646011352539,3.805464935302734 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 3.805464935302734,4.07728385925293 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 4.07728385925293,4.349102783203125 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 4.349102783203125,4.62092170715332 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 4.62092170715332,4.892740631103515 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 4.892740631103515,5.16455955505371 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 5.164559555053711,5.436378479003906 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 5.436378479003906,5.708197402954101 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 5.708197402954101,5.9800163269042965 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 5.9800163269042965,6.251835250854492 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 6.251835250854492,6.523654174804687 ERROR:root:Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 6.523654174804687,6.795473098754882 ERROR:root:The measured tr list is empty, not able to calibrate An exception has occurred, use %tb to see the full traceback.

You also forgot to remove a debugging print(v)

courcelm commented 4 years ago

I inspected the calibration function.

I believe there is a range boundary issue. I think predicted_tr[-1] is wrong. It is not the max predicted value because the sort is done on measured_tr.

` #split_val = predicted_tr[-1]/self.split_cal split_val = 10/self.split_cal

    #for range_calib_number in np.arange(0.0,predicted_tr[-1],split_val):
    for range_calib_number in np.arange(0.0,10,split_val):`
courcelm commented 4 years ago

I ran the prediction on a few models from your paper and two data sets with the fix mentioned above. It fixes the blank spots but calibration is not right every time.

Not on the diagonal here: image

End issue here: image

courcelm commented 4 years ago

I investigated the two cases above.

For the first case. The calibration curves had only two points. I believe that the sorting should be done on the predicted tr instead of the measured tr.

tr_sort = [(mtr, ptr) for mtr, ptr in sorted( zip(measured_tr, predicted_tr), key=lambda pair: pair[1])]

1 instead of 0. With this change, there is no need to apply my code change suggested above.

On my tests, the calibration is fixed.

For the second plot case, I think it is more an issue with the selected model not spreading the values for my dataset. This is not relevant to this issue.

RobbinBouwmeester commented 4 years ago

That is great! Thank you for digging so deep and resolving the issue.