compomics / DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
https://iomics.ugent.be/deeplc
Apache License 2.0
52 stars 18 forks source link

Skipping calibration step? #14

Closed MarcIsak closed 3 years ago

MarcIsak commented 4 years ago

Hello,

I tried to use DeepLC 0.1.17 for predicting retention times for some peptides in a csv file. I also provided a calibration file to the software. When looking at the console output, I could see some strange errors like:

ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3977001953125,0.795400390625

I am not really sure what this means? Does it mean that only some peptides in the calibration file are not included because they have invalid RTs? Or does the program not use the calibration file at all? I did not encounter this issue before (but then I provided iRT values and not RTs in minutes for the calibration)

I would be happy if you could help me understand this issue a little better (see attached console output).

Thank you in advance. :-)

Best,

Marc

(venv) marc@supercomputer:~/PythonProject/deeplc$ deeplc --file_pred /mnt/d/Data_MS2PIP/MSLibrarian_1/precursors_rt.csv --file_cal /mnt/d/Data_MS2PIP/MSLibrarian_1/precursors_rt_calib.csv --n_threads 8 2020-08-27 10:26:10.855515: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory2020-08-27 10:26:10.855561: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2020-08-27 10:26:11 - INFO - Using DeepLC version 0.1.17 2020-08-27 10:26:12 - INFO - Selecting best model and calibrating predictions... 2020-08-27 10:26:12.706689: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-08-27 10:26:12.706736: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2020-08-27 10:26:12.706751: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (supercomputer): /proc/driver/nvidia/version does not exist 2020-08-27 10:26:12.706966: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-08-27 10:26:12.723027: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3311995000 Hz 2020-08-27 10:26:12.726908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4cf0b40 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-08-27 10:26:12.727027: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 1/1 [==============================] - 0s 288us/step 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.3977001953125 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3977001953125,0.795400390625 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.795400390625,1.1931005859375001 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1931005859375001,1.59080078125 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.59080078125,1.9885009765625 1/1 [==============================] - 0s 317us/step 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.4063890075683594 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.4063890075683594,0.8127780151367188 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.8127780151367188,1.2191670227050782 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.2191670227050782,1.6255560302734375 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.6255560302734375,2.0319450378417967 1/1 [==============================] - 0s 252us/step 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.3999577331542969 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3999577331542969,0.7999154663085938 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.7999154663085938,1.1998731994628906 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1998731994628906,1.5998309326171876 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.5998309326171876,1.9997886657714845 1/1 [==============================] - 0s 257us/step 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.38717674255371093 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.38717674255371093,0.7743534851074219 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.7743534851074219,1.1615302276611328 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1615302276611328,1.5487069702148437 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.5487069702148437,1.9358837127685546 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.9358837127685546,2.3230604553222656 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.3230604553222656,2.7102371978759763 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.7102371978759763,3.0974139404296874 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 3.0974139404296874,3.4845906829833986 2020-08-27 10:26:22 - INFO - Making predictions using model: {'/home/marc/PythonProject/deeplc/venv/lib/python3.6/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5': '/home/marc/PythonProject/deeplc/venv/lib/python3.6/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5'}

RobbinBouwmeester commented 4 years ago

Dear Marc,

Those are indeed quite a few "errors", but they should not be a problem.

What is suspect is that your iRT peptides spanned a fairly nice range of retention times. If the new list of calibration peptides is not nicely split along the different retention times it might be that sub-divisions (or splits) do not contain any peptides to fit the calibration curve. This means that we will take the next sub-division (or until we do have sufficient points for calibration) as calibration curve.

So we just ignore that sub-division for calibration if there is insufficient data. I do not expect ignoring certain parts of the chromatogram is going to make a huge difference, but ofcourse for the best calibration I would advise good coverage.

The "error" is more of a warning and will change this in the coming version together with a clearer message.

Does that solve your issue?

Kind regards,

Robbin

RobbinBouwmeester commented 3 years ago

Hi marc,

Just checking in if the comment above solved your issue. If so I will close this issue.

Kind regards,

Robbin

MarcIsak commented 3 years ago

Ok, so one should try to have calibration peptides which have RTs dispersed along the entire MS run time? I think the calibration peptides I used had RTs in the entire range, but there was definitely a normal distribution i.e relatively few peptides at the beginning and the end of the gradient.

Perhaps a more even RT distribution along the entire RT range?

Best,

Marc

RobbinBouwmeester commented 3 years ago

Hi Marc,

Yes, an even distribution along the whole retention time range would be perfect.

However, it is not really a problem if they don't. The calibration quality might just be a bit less.

If you have any further questions regarding this issue please reopen it :).

Kind regards,

Robbin