Open alphataubio opened 6 months ago
This is likely arising due to removal of coarse-grained particle names from the elements
dictionary or an element symbol being present in the dictionary but not having a vdw_radii
key in the element's subdictionary.
For quick reference, the code in question is: https://github.com/BradyAJohnston/MolecularNodes/blob/fba40bddbee05e2891ca92908df9f311085f42b8/molecularnodes/io/parse/mda.py#L175-L181
Do you happen to have any heavy metal atoms in this structure?
Doesn't look like it. But the atom names listed in the .psf file are certainly not standard atom names and aren't being handled well before the vdw_radii
method is called. Specifically, the unexpected coarse-grained particles' atom names are fed into the MDAnalysis.topology.guessers.guess_atom_elements() to guess an element symbol:
Applying this code to the coarse-grained particles in your system returns a set of "elements symbols":
elements = [x if x in mn.data.elements.keys() else MDAnalysis.topology.guessers.guess_atom_element(x) for x in all_sel.atoms.names]
print(set(elements))
> {'A', 'AL', 'B', 'C', 'G', 'H', 'I', 'L', 'M', 'P', 'S', 'T'}
This is a perfect example of the the MDA guesser failing on us. Related to #452 and #468. Iodine doesn't have a vdw_radii defined in its subdictionary.
This is likely arising due to removal of coarse-grained particle names from the
elements
dictionary or an element symbol being present in the dictionary but not having avdw_radii
key in the element's subdictionary.
where is this elements dictionary ? in MDAnalysis in python somewhere ?? looks like i have to add the spica cg amino acids to it manually.
Do you happen to have any heavy metal atoms in this structure?
no it's a protein with 687 residues, spica coarse grains each amino acids into 1-5 beads:
PSF
2 !NTITLE
* created by setup_lammps
* dummy
1579 !NATOM
1 MET 1 MET GBTL GBTL 0.111800 56.0385
2 MET 1 MET MET MET 0.000000 75.1543
3 ALA 1 ALA ABBL ABBL 0.000000 71.0732
4 GLU 1 GLU GBML GBML 0.000000 56.0385
5 GLU 1 GLU GLU GLU -0.111800 72.0526
6 GLU 1 GLU GBML GBML 0.000000 56.0385
7 GLU 1 GLU GLU GLU -0.111800 72.0526
8 LEU 1 LEU GBML GBML 0.000000 56.0385
9 LEU 1 LEU LEU LEU 0.000000 57.1151
10 VAL 1 VAL GBML GBML 0.000000 56.0385
11 VAL 1 VAL VAL VAL 0.000000 43.0883
...
The elements
dictionary is in https://github.com/BradyAJohnston/MolecularNodes/blob/main/molecularnodes/data.py but the codes that I linked to above are using MDAnalysis to fill in missing atom information when they're not included in the structure/topology files input to MolcularNodes.
I'm getting a fix pushed shortly.
Applying this code to the coarse-grained particles in your system returns a set of "elements symbols":
elements = [x if x in mn.data.elements.keys() else MDAnalysis.topology.guessers.guess_atom_element(x) for x in all_sel.atoms.names] print(set(elements)) > {'A', 'AL', 'B', 'C', 'G', 'H', 'I', 'L', 'M', 'P', 'S', 'T'}
This is a perfect example of the the MDA guesser failing on us.
computer is only as dumb as what you tell it, i have to be explicit and provide the cg spica definitions somewhere.
Related to #452 and #468.
ok ill look at those other issues and try to contribute a fix.
Iodine doesn't have a vdw_radii defined in its subdictionary.
there's no iodine anywhere in this file, only the 20 standard amino acids. Since iodine has an atomic weight of ~127, then the code appears to be matching based on weight so a side chain with the backbone of MET say with two cg beads for a total weight of 56+75=131 is getting matched as iodine incorrectly
Yeah, the MDAnalysis guesser functions get pretty desperate to fill in missing information. The element guesser seems to be doing some strange things.
Yeah, the MDAnalysis guesser functions get pretty desperate to fill in missing information. The element guesser seems to be doing some strange things.
i have to merge the spica cg beads information into data.py, this way no guessing required.
see attached json: spica_top.json
@rbdavid i have vdw radii up to 99 from pubchem, and the rest are available in mathematica. ill fork this repo and add all the radii to data.py later tonight.
I only glanced at the elements data in Wolfram Mathematica; if I recall correctly, I was unable to find the sources for those values. MDAnalysis also includes a table of vdw radii values but its very limited. If you have the time, check the vdw radius values against the papers I link to in the #468. Filling out the elements
dictionary would be greatly appreciated.
But in this case, the vdw_radii values listed in the elements
dictionary will not be accurate for your system. In fact, the current implementation of the MDAnalysis.topology.guessers.guess_atom_element()
is assigning incorrect element symbols to your coarse-grained particles. These incorrect element symbols are then used to grab vdw_radii values from the MolecularNodes elements
dictionary. If the element symbol is not represented in the element dictionary, then a default value of 100 picometers is used for the particle's vdw_radii value. This may affect your Blender visualizations so be wary.
@rbdavid how do you run this python code:
elements = [x if x in mn.data.elements.keys() else MDAnalysis.topology.guessers.guess_atom_element(x) for x in all_sel.atoms.names]
print(set(elements))
> {'A', 'AL', 'B', 'C', 'G', 'H', 'I', 'L', 'M', 'P', 'S', 'T'}
i tried pasting it in the blender python window but i got:
mn.data.elements.keys() Traceback (most recent call last): File "
", line 1, in NameError: name 'mn' is not defined elements = [x if x in mn.data.elements.keys() else MDAnalysis.topology.guessers.guess_atom_element(x) for x in all_sel.atoms.names] Traceback (most recent call last): File "
", line 1, in NameError: name 'all_sel' is not defined print(set(elements)) Traceback (most recent call last): File "
", line 1, in NameError: name 'elements' is not defined
TLDR: Coarse-grained particles should never be mapped to elements but they currently are when importing such a structure file into MolecularNodes. Basically, don't trust the assigned element
, mass
, nor vdw_radii
values listed in the respective vertex attributes for a CG'd system. They're gonna be wrong. Your visualizations will likely be affected.
Oh shoot. The code snippet that you are trying to use is a part of a small ipython instance that I ran to debug what was happening for this error. It highlights a deeper, fundamental bug in how MN (but really MDAnalysis) fills in missing information about atoms' element labels. Its not something I was doing within blender or the MolecularNodes python API. I'll try to recreate it here in full:
import MDAnalysis
import molecularnodes as mn
import pprint
u = MDAnalysis.Universe('4pyg-e.psf','4pyg-20fs-3frames.lammpsdump')
all_sel = u.select_atoms('all')
print(sorted(set(all_sel.atoms.names)))
> ['ABB', 'ABBL', 'ABBS', 'AR1', 'AR2', 'ASN', 'ASP', 'CYS', 'GBB', 'GBBL', 'GBBS', 'GBM', 'GBML', 'GBMS', 'GBTL', 'GLN', 'GLU', 'HI1', 'HI2', 'HI3', 'ILE', 'LEU', 'LY1', 'LY2', 'MET', 'PH1', 'PH2', 'PH3', 'PH4', 'PRO', 'SER', 'THR', 'TR1', 'TR2', 'TR3', 'TY1', 'TY2', 'TY3', 'TY4', 'VAL']
elements = [x if x in mn.data.elements.keys() else MDAnalysis.topology.guessers.guess_atom_element(x) for x in all_sel.atoms.names]
print(sorted(set(elements)))
> ['A', 'AL', 'B', 'C', 'G', 'H', 'I', 'L', 'M', 'P', 'S', 'T']
pprint.pp(list(zip(all_sel.atoms.names, elements)))
The mn.data.elements
dictionary only contains information about elements so none of the coarse-grained particles in your structure file will be parsed by that. Those particles names are therefore handed over to the MDAnalysis guesser function ((source code) [https://docs.mdanalysis.org/stable/_modules/MDAnalysis/topology/guessers.html#guess_atom_element]) that does some pretty loose guessing, as you can see. For example, the VAL
particles in your structure are mapped to an AL
element symbol. and your gut reaction might be "HUH?!?!" That makes no sense when we know the context of the coarse grain parameters/particles.
But, MDAnalysis nor MN can know that context easily; there are literally too many all-atom and coarse grained force fields out there to have a guesser that covers all that ground without ambiguity/conflicts in name spaces. Like, the atom name CD
maps to a delta carbon atom name in most all-atom force fields but also gets mapped to a particle name in the Martini coarse grain force field. Its a mess! and a nontrivial amount of work to sort everything out.
Once PR #485 is merged with main, the fatal error you are seeing that stems from all this weirdness should be squashed. You'll be able to load your structure in just fine. But! and this is a big but, none of your elemental data values will be accurate. They shouldn't even be considered. The assigned element
, vdw_radii
, nor mass
values for vertices will be wrong. If you have the correct data for the CG-equivalent mass and radius values, I highly suggest you correct those attributes so that, when they are used to visualize your system, you are looking at the correct visualization. Right now, your VAL particles are likely getting the mass and radius values from aluminum, while others are getting hydrogen's or boron's.
If anyone knows a gpt-like or natural language processing tool that can guess force-field context (all-atom versus coarse-grained, or GROMOS versus AMBER force fields) from a structure file, that might be a great first step to solving this problem.
['ABB', 'ABBL', 'ABBS', 'AR1', 'AR2', 'ASN', 'ASP', 'CYS', 'GBB', 'GBBL', 'GBBS', 'GBM', 'GBML', 'GBMS', 'GBTL', 'GLN', 'GLU', 'HI1', 'HI2', 'HI3', 'ILE', 'LEU', 'LY1', 'LY2', 'MET', 'PH1', 'PH2', 'PH3', 'PH4', 'PRO', 'SER', 'THR', 'TR1', 'TR2', 'TR3', 'TY1', 'TY2', 'TY3', 'TY4', 'VAL']
where should i add these in data.py ? in elements or coarse_grain_particles ??
when i grep the whole MN repo, "coarse_grain_particles" only shows up in data.py. is it even used somewhere ?
That coarse_grain_particles
dictionary is a stash of entries to a previous version of the elements
dictionary that did contain Martini coarse grain information. It is not used anywhere at the moment.
You can place the SPICA coarse-grain particle information into that dictionary since, in my opinion, it should not go into the elements
dictionary. Still, we then need to do some reworking in the mda.py
(see below) and molecule.py
code to somehow flip a switch to say "use coarse-grained dictionary instead of element dictionary" while assigning particle attribute values. This gets a bit complicated because I think its possible to mix coarse-grained particles into all-atom systems. Its just a mess.
Alternatively, you posted a json file earlier that contains all relevant information about the SPICA CG force field. It should be feasible/possible to analyze that json file and fill in incorrectly assigned attribute values within the Blender instance so that you're working with the expected set of values.
I'm not familiar enough with the SPICA force field to say what those values are or what not. I can help with parsing the json and sending data to attributes but not so much with considering the values associated with that model.
im working on adding all the spica beads into elements dictionary starting with atomic number 1000, ill make a draft PR when data.py is ready
added vdv_radii up to 99 and spica cg particles to data.py see PR 486
SOURCES:
Describe the bug loading MD lammps cg spica trajectory gives error and does not load.
To Reproduce Topology: 4ypg-e.psf Trajectory: 4ypg-20fs.lammpsdump "Load"
Expected behavior load md lammps trajectory without error
Error Codes
Desktop (please complete the following information):
Additional context .psf and .lammpsdump with 3 frames attached as zip 4pyg-mwe.zip