compomics / DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
https://iomics.ugent.be/deeplc
Apache License 2.0
56 stars 19 forks source link

Tons of warnings when using iTRAQ4plex or iTRAQ8plex #75

Open vrkosk opened 4 months ago

vrkosk commented 4 months ago

If you use a modification like iTRAQ4plex or iTRAQ8plex, whose composition includes 13C or 15N, you get tons of warnings like:

2024-07-16 15:36:18,842 WARNING Could not add the following atom: N[15], attempting to replace the [] part
2024-07-16 15:36:18,842 WARNING Could not add the following atom: C[13], attempting to replace the [] part

This comes from FeatureExtrator.encode_atoms(). It has an argument dict_index whose default value doesn't have 13C or 15N. As far as I can tell, it's not possible to override the default from user code, as the only place where encode_atoms() is called is within FeatureExtractor itself.

RobbinBouwmeester commented 4 months ago

I see, this warning is indeed shown for isotopes. It will still count the atom, but not as an isotope (as far as I know for N and C this is not a problem, for deuterium it is a different issue).

It would be possible to only display the same warning a single time, would that solve your issue? I still prefer to show the warning as this is the most transparent thing to do to the user.

vrkosk commented 4 months ago

Showing the warning only once would be an improvement.

I'm not sure I understand the current behaviour, though. If it's substituting isotopes, it will predict the same retention time for peptides that only differ by heavy and light SILAC labels, for example:

LSSPATLNSR
LSSPATLNSR[U:Label:13C(6)]

If they do indeed have the same observed retention time, why display the warning? And if they don't have the same retention time, surely DeepLC should try to predict the RT according to the isotope composition?

RobbinBouwmeester commented 4 months ago

Indeed is the only difference is a heavier isotope the predictions will be the same. While in most cases this is correct, for example in the case of deuterium it can create a difference in observed retention time. In your example it (should) not make a difference in observed retention time.

Unfortunately, currently, DeepLC cannot account for different isotopes. It would take quite a bit of effort to change this. So for now we show a warning to indicate we cannot account for isotopes.