Select Chain node has problems seperating two-letter named chain

BradyAJohnston / MolecularNodes

Toolbox for molecular animations in Blender, powered by Geometry Nodes.

https://bradyajohnston.github.io/MolecularNodes/

GNU General Public License v3.0

899 stars 83 forks source link

Select Chain node has problems seperating two-letter named chain #630

Closed jojoelfe closed 1 week ago

jojoelfe commented 1 week ago

Describe the bug Some larger models, such as ribosomes have so many chains that they sometime have two letter ids: AA, AB, etc. The "Select Chain" node groups these together by the first letter

To Reproduce Steps to reproduce the behavior:

Open 7PZY (https://www.rcsb.org/structure/7PZY)
Selecting chain "J" will select both chain "J" and "AJ"

Expected behavior There should be separate checkboxes foir two-letter chains.

This is using blender 4.2.2 and downloading the extension from the marketplace.

BradyAJohnston commented 1 week ago

I believe this only happens when loading a structure from a .pdb file, which is a limitation of that particular file forma (unless you have another non-.pdb example).

If I import / fetch the example structure as either .bcif of .cif then it detects all of the chains fine. If however I open it inside of ChimeraX, save as a .pdb, then try to import that file we lose the chains. You can't actually download 7PZY (and other larger structures) from the PDB as .pdb files for this reason.

This is due to the parsing done by biotite. I am unsure how much can (or should) be done - as large structures shouldn't be using the .pdb format.

BradyAJohnston commented 1 week ago

To clarify what I said:

The creation of the select chain node is working fine, but when biotite parses the structure it only takes a single letter (column 22) from the PDB file for the chain ID. When MN goes to create the node, both of the chains J and AJ have been assigned J by biotite so that's why they are both selected.

jojoelfe commented 1 week ago

Ah, yes. Works perfectly with a .cif. ChimeraX does write the two-letter chain-IDs in the pdbs, but it might be violating the PDB format for this.

In any case, there is no good reason why I shouldn't be using .cif files, I guess its mostly .pdb nostalgia.

BradyAJohnston commented 1 week ago

It's hard, but we all should start saying goodbye to the .pdb format 🥲