BradyAJohnston / MolecularNodes

Toolbox for molecular animations in Blender, powered by Geometry Nodes.
https://bradyajohnston.github.io/MolecularNodes/
GNU General Public License v3.0
912 stars 85 forks source link

Extra/wrong bonds being formed while importing some small molecules #548

Open julioarvellos opened 4 months ago

julioarvellos commented 4 months ago

Hello! I'm importing local molecules to Blender but they automatically create extra bonds between random atoms. The molecules were initially exported from Pymol and sent from a client but I can't reproduce how they exported the files. Curiously the same issue doesn't happen if I open the files with VMD, Discovery Studio or GaussView. I thought it could be a problem with cloud services like OneDrive and Google Drive, but it doesn't seem to be the case.

I used the same workaround mentioned here https://github.com/BradyAJohnston/MolecularNodes/discussions/478 with the Find Bonds node, but for some molecules with very close atoms it didn't work.

This isn't the first time that it happened with a local file, but is not common though..

To Reproduce Simply import the molecule and tab to see the wireframe

Desktop (please complete the following information):

triterpene.zip

Screenshot_1 Screenshot_2

rbdavid commented 4 months ago

I have another example where a small molecule has incorrectly visualized bonds. For the protonated molecule, I pulled a post-MD simulation snapshot and saved the structure to a .pdb file. The .pdb file does not contain any topology info, which is likely the cause for the complete lack of bonds being shown. When downloading the original xtal structure (1OX5), the ligand structure looks great!

Blender 4.2 daily release candidate and MN 4.2.3 extension.

igps_simulation_structure.zip.zip

messy_ligand png

clean_ligand png

BradyAJohnston commented 4 months ago

Okay in my testing - it is an error upstream with biotite. After parsing with biotite there are extra bonds that shouldn't be there.

import biotite.structure.io.pdb as pdb

filepath = "/Users/brady/Downloads/triterpene/triterpene-1.pdb"

file = pdb.PDBFile.read(filepath)
parsed = pdb.get_structure(file, include_bonds=True)
len(parsed.bonds.as_array())

Out[1]: 51

The file has only 33 bonds defined, so biotite is falling down.

In the short term we could add a step to remove bonds over a certain length when importing so they never would make it to the GN stage - but this doesn't catch all of them.

If you'd like to report this issue to biotite that would be great - otherwise I can open an issue up later.

julioarvellos commented 3 months ago

Hey, Brady. I believe you're more aware of the situation behind the curtains about biotite / blender / python etc., so if you don't mind, I'd ask you to report the issue there (and use the files if needed).

For the moment, the #478 workaround is a good alternative for molecules where heavy atoms do not get too close to each other due to flexible conformations. I'm not sure though if this is an issue during trajectories - if the bonds are formed randomly on each frame, or if it is a one-time problem.

I'm wondering if it would be a good idea to pin a post somewhere with common issues that are not caused by MN. 🤔

BradyAJohnston commented 3 months ago

I'll put together a bug report when I get some free time in the next few days

BradyAJohnston commented 2 months ago

The issue is with the files using non-standard atom names. Biotite is adding extra bonds from their atom names according to the CCD (see https://github.com/biotite-dev/biotite/issues/638). I am unsure what to be done here. I could add a check to remove bonds over a certain distance, but this can already be done inside of GN, and won't remove bonds that were incorrectly formed between two close atoms that aren't meant to be bonded.

julioarvellos commented 2 months ago

I think it is already good to be aware of the issue. For simple molecules (my case) the issue can be solved deleting the extra bonds manually inside blender. For complex or many molecules, maybe the fastest option is to convert the file to another format (like mol2 or xyz) and then convert back again to PDB. I tested the conversion PDB > XYZ > PDB and PDB > MOL2 > PDB using Open Babel and both solved the problem with the triterpene molecules. Not sure though how to export multiple files or if it is a general solution...

You can avoid the first PDB > "file" conversion if the first time the molecule is exported it already comes in a different format. Not ideal, but should work.

Also, there could be a warning if a bond with more than 2,5 Angs is formed - the user would be responsible to check if this bond makes sence (e.g. when a metal is present)

In the past I've used many software and each exported the files with a particular issue (pdb and mol2 were the most common file formats to me) and file conversion/correction was part of the workflow...

BradyAJohnston commented 2 months ago

@rbdavid in the example that you showed - the wrong bonds were being formed when you exported the ligand from the simulation. What simulation package were you exporting from and how did you export it?

rbdavid commented 2 months ago

My example's ligand is a native substrate of an imidazole glycerol phosphate synthase (IGPS). I imagined that it wouldn't be included in the CCD for biotite to interpret bonds but its apparently in PDBeChem (https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/1PR). I'm not sure if biotite does a check for this type of small molecule's topology info by default.

Its been a bit since I prepped this system, but I believe the poorly-visualized-in-MN structure was pulled out of a full trajectory made using AMBER. To grab the structure, I used MDAnalysis to load up the trajectory then jumped to a frame and wrote a .pdb file. The missing CONECT lines are likely because I did not tell MDA to explicitly write those lines to file. Alternatively, MDA might not have gathered that information when I created the Universe object. Maybe I didn't use the simulation's prmtop file?

rbdavid commented 2 months ago

Based on the biotite issue linked above, it might be worth implementing the connect_via_distances() method if/when poor bonding is detected/observed.

To make the unfair comparison, vmd and pymol do appropriately draw the ligand's topology correctly from the pdb file.