MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 651 forks source link

Implementation of SugarSelection #4790

Open talagayev opened 1 week ago

talagayev commented 1 week ago

Fixes #4563

Changes made in this Pull Request:

Currently I used the following abbreviations:

https://glycam.org/docs/othertoolsservice/2016/06/09/3d-snfg-list-of-residue-names/index.html

In addition of using the GLYCAM Webserver to obtain the known Sugar abbreviations and also included the aglycans that I obtained from the GLYCAM Weberserver.

The Pytest Files were retrieved from RCSB-PDB and the GLYCAM-Webserver:

https://glycam.org/

PR Checklist

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4790.org.readthedocs.build/en/4790/

pep8speaks commented 1 week ago

Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2024-11-15 22:28:53 UTC
codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.64%. Comparing base (b254921) to head (4cf52a4).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #4790 +/- ## =========================================== - Coverage 93.66% 93.64% -0.03% =========================================== Files 177 189 +12 Lines 21742 22816 +1074 Branches 3055 3055 =========================================== + Hits 20365 21366 +1001 - Misses 930 1003 +73 Partials 447 447 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

talagayev commented 6 days ago

@hmacdope I will ping you here since you were one of the participants in the discussion that initiated the Issue that this PR is covering.

The main problem that I see currently, would be that the GLYCAM Abbreviations for the sugars due to the combinations have quite a lot of different names/abbreviations.

Some of them could lead to tricky cases, with the Allose Nomenclature having RNA as one of the abbreviations.

I checked the RCSB-PDB and found only one case, where a unique ligand was called RNA, but still could be dangerous if the users call something "RNA" in their files.

As for the coverage, while looking at the PDBs in RCSB-PDB it covers NAG, GLC etc., which would be convenient if somebody wants to select those in the PDB Files, but does not cover for example Glycerol, which makes sense since it is not a sugar, but is quite common in crystal structures among those sugars as NAG, GLC. Does it make sense to have a selection that would somehow covers both cases of Glycerol and similar compounds together with sugars?