DeepRank / pdb2sql

Fast and versatile biomolecular structure PDB file parser using SQL queries
https://pdb2sql.readthedocs.io
Apache License 2.0
24 stars 12 forks source link

the atom order is independent of the returned values, which should not be the case #78

Closed DTRademaker closed 2 years ago

DTRademaker commented 2 years ago

Hello,

Li and I found a bug. It seems that when you want the specific coordinates (or other values) from a pdb, the order of the atom-list is not uphold in the return values. So if you switch the order (e.g., CA, C, O -> O, C, CA) the return values are identical. Therefore the user will never know which values belong to which atoms. The order the program returns seems to be in the order the pdb itself has. Therefore, also, if one compares 2 pdbs where the atom-name order is different in the pdbs, the user will get wrong results and conclusions.

Some example code below, the pdb is uploaded with this issue.

from pdb2sql import pdb2sql db = pdb2sql('ref.pdb') db.get('x', name=['C', 'O', 'CA']) [-17.244, -16.714, -18.362] db.get('x', name=['C', 'CA', 'O']) [-17.244, -16.714, -18.362]

ref.zip

NicoRenaud commented 2 years ago

Hi Daniel, I would not call that a bug really. It was the intended behavior at least. In most cases there will be more than 1 atom per atom type returned (i.e. many oxygens, carbons etc …) so returning only the coordinate would not allow to identify atoms.

NicoRenaud commented 2 years ago

I would strongly recommend against modifying the behavior of the get function to keep the order of the atom list provided by the user in argument of the get function

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 7 days since being marked as stale.