compomics / DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
https://iomics.ugent.be/deeplc
Apache License 2.0
52 stars 18 forks source link

Low prediction accuracy for versions 2.0.4+ #54

Closed markmipt closed 1 year ago

markmipt commented 1 year ago

Hi,

I've noticed that starting from version 2.0.4 update I get much worse RT prediction accuracy compared to version 1.1.2. I have a set of peptides (~2000) which is split into two parts - one for DeepLC calibration and one for estimation of predicted RTs. The old version provides me with 0.124 min standard deviation for the difference between predicted and experimental RTs, while the new one (2.0.4) - 0.321 min.

All peptides have no modifications except fixed Carbamidomethyl of C. I run DeepLC with basic command line options (deeplc_path, '--file_pred', estimate_file_name, '--file_cal', calibrate_file_name, '--file_pred_out', out_file_name). The only strange thing in my data is ~5-20% peptide FDR.

The same behavior I see for another dataset, as well as for the latest DeepLC version (2.1.9).

Please find the attached DeepLC logs, files for testing and the figures with results.

Regards, Mark deeplc_bug_hist deeplc_bug_scatterplot

test_calibrate.txt test_estimate.txt

Log_DeepLC112.txt

Log_DeepLC204.txt

RobbinBouwmeester commented 1 year ago

Hi @markmipt,

Thank you for spotting this bug!

Last week I was also contacted by someone else that the feature calculation between these versions is different (i.e., the calculated compositions are not the same).

This difference in calculated features is also the cause of lower performance. I will need to retrain models with the new features to account for this. I will do the retraining ASAP.

Kind regards,

Robbin

RobbinBouwmeester commented 1 year ago

The v2.2.0 release should fix this issue 7aa082d

markmipt commented 1 year ago

Thanks!

Now the results looks similar.

deeplc_bug_scatterplot2 deeplc_bug_hist2

Regards, Mark