DeepRank / pdb2sql

Fast and versatile biomolecular structure PDB file parser using SQL queries
https://pdb2sql.readthedocs.io
Apache License 2.0
24 stars 12 forks source link

Update pdb2sqlcore.py #72

Closed DTRademaker closed 1 year ago

DTRademaker commented 2 years ago

Add sorting of the values to the 'get' function. This to fix a bug that sometimes occurs when comparing the atoms between multiple pdbs ( e.g. when using StructureSimilarity script). The bug is caused by pdbs having not the same atom-name (e.g. CA C CB O N...) order.

github-actions[bot] commented 2 years ago

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 years ago

This PR was closed because it has been inactive for 7 days since being marked as stale.

LilySnow commented 2 years ago

Hi Cunliang, I do not think we need an order parameter, right? I will accept this change.

NicoRenaud commented 2 years ago

Hi all, I’m not sure I would change the way get returns the values. The behavior described here was precisely what was intended. Since the issue is only in structure similarity I would add the sorting option there and not in get.

CunliangGeng commented 2 years ago

Agree with Nico. That's also why I suggested adding an option to get itself.

NicoRenaud commented 2 years ago

Adding the order argument to get is a solution but it makes get more complicated ... I would really simply fix that in StructureSimilarity and not touch get. The get method is central to all the data pipelines we have created (iscore, deeprank, deeprank GNN) so modifying it is not a trivial decision. Adding an order option would also make get do two things (SQL query + sorting) and that's 1 thing too many :)

LilySnow commented 2 years ago

Hi Nico, there seems no other way to fix this bug without changing the get function. We cannot fix StructureSimilarity downstream...

NicoRenaud commented 2 years ago

Hi Li, I'm not sure what the issue is, but the ordering can surely be done outside of the get function by simply getting the chainID, resSeq, name, x,y,z and ordering the resulting array according to the chain,res,name values. Or am i missing something ?

In any case changing the default behavior of get may break the data processing pipeline of all the codes that rely on it (iscore, deeprank, ... ). So I would be careful about it :)

DTRademaker commented 2 years ago

Hi Li, I'm not sure what the issue is, but the ordering can surely be done outside of the get function by simply getting the chainID, resSeq, name, x,y,z and ordering the resulting array according to the chain,res,name values. Or am i missing something ?

In any case changing the default behavior of get may break the data processing pipeline of all the codes that rely on it (iscore, deeprank, ... ). So I would be careful about it :)

The 'get' function is called 12 times in the 'StructureSimilarity' script for retrieving xyz coordinates. These should all be adjusted then. Yes of course we can do it by retrieving 'chainID, resSeq, name, x,y,z', sorting it and extract the xyz and that times 12. But that would miss the point of having the get function in the first place where you specify what you need. Especially since this could also be done, as implemented by my suggestion, directly in the sql command of the 'get' function. Which I might add is computationally also faster. I understand the concern that it 'might' impact other software. But I could also argue that it 'might' actually improve the other software since the atom-order is currently completely dependent on the pdb-atom-order which can sometimes differ from pdb to pdb. And to be honest I can not imagine any scenario where having the residue atoms in a consistent order would disturb the other code... but that could of course also be my lack of imagination. The code I am suggesting passed all unit-tests, and should therefore also work fine for the other software right?

I didn't wrote the other code, so I can not be sure, but I would suggest to use the extra argument for the 'get' function if you really want to play it safe. But I think the inconsistency problem is a more general problem that should be solved by hardcoding it in the get function, resulting that the rest of the code should be tested and adapted if needed.

github-actions[bot] commented 2 years ago

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 years ago

This PR was closed because it has been inactive for 7 days since being marked as stale.

NicoRenaud commented 1 year ago

Hey @DTRademaker @LilySnow @CunliangGeng, should we merge that ?

DTRademaker commented 1 year ago

Hey @DTRademaker @LilySnow @CunliangGeng, should we merge that ?

Fine with me :)

NicoRenaud commented 1 year ago

@LilySnow if you also agree can you merge the PR ?

github-actions[bot] commented 1 year ago

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This PR was closed because it has been inactive for 7 days since being marked as stale.