Loading a complex PDB file combines different chains

BradyAJohnston / MolecularNodes

Toolbox for molecular animations in Blender, powered by Geometry Nodes.

https://bradyajohnston.github.io/MolecularNodes/

GNU General Public License v3.0

890 stars 83 forks source link

Loading a complex PDB file combines different chains #610

Closed perezbertoldi closed 2 weeks ago

perezbertoldi commented 2 weeks ago

Describe the bug I am trying to upload a PDB file locally to Blender through Molecular Nodes. The file is quite complex as it contains ~150 chains (names of chains are either numbers, lower or upper case letters and single or double digits). When I load the file to Blender, and try to add a Selection Geometry Node I see the number of chains gets drastically simplified, and multiple chains are combined into single ones. I tried attaching the PDB file that is causing issues but the size is larger than 25 MB.

Expected behavior I would have expected that all chains are preserved exactly as when I open the PDB file in Chimera X.

Desktop (please complete the following information):

OS: Linux
Hardware:
Blender Version: 4.2
- MolecularNodes Version: Latest

BradyAJohnston commented 2 weeks ago

Are you able to share the example file through another way - google drive / one drive / dropbox etc?

Alternatively, are you able to share a unique list of all of the chain names?

What happens under the hood, is that the chains are assigned an integer value based on their unique alphabetical order.

np.unique(structure.chain_id)

The result of this unique filtering of the chain IDs is what determines how many chain IDs are being read, so there are likely chains that are being determined to be the same ID when they shouldn't be. This might be because you are using a .pdb file and the chain IDs are not being read properly. Can you instead try exporting to .cif / .mmcif?

perezbertoldi commented 2 weeks ago

Sure Brady, thanks for the quick response. Here it goes: https://drive.google.com/file/d/1KFDwjWf3EG_JipE8fC3beWPJxXUQkxI5/view?usp=sharing

I actually tried to simplify the PDB file by renumbering atoms and trying to unify chains in a way that would be convenient downstream in Molecular Nodes but failed horribly so far.

And yes, let me try using a CIF file instead.

BradyAJohnston commented 2 weeks ago

If you open inside of ChimeraX (probably pymol / vmd also) then save as a .cif and import into Molecular Nodes, it fixes the issue.

The issue is coming from as I suspected when there are chain IDs with multiple letters:

ATOM  A34VJ  C   METAr 413     167.442 365.414 159.845  1.00 33.72           C
ATOM  A34VK  O   METAr 413     166.729 365.172 158.879  1.00 36.38           O
ATOM  A34VL  CB  METAr 413     169.795 364.804 160.364  1.00 34.32           C
ATOM  A34VM  CG  METAr 413     170.776 363.851 161.003  1.00 33.73           C

biotite which does the file parsing under the hood is likely only grabbing a single letter from the column position where it is supposed to be stored.

I'll look into this a bit more, but I assume this is likely an issue with using .pdb files and the solution of just using the newer (and better) .cif files is the way forward.

BradyAJohnston commented 2 weeks ago

As an aside: PDB files were not designed for structures this large and should not be used for them. .cif is the new standard (and .pdb has been deprecated / retired for years) so I suggest you moving away from using them

perezbertoldi commented 2 weeks ago

Thank you, that actually worked nicely. And this is good advice, from now on I will save my files as CIF instead of PDB.