cctbx / cctbx_project

Computational Crystallography Toolbox
https://cci.lbl.gov/docs/cctbx
Other
224 stars 119 forks source link

Potential problem with **get_specific_ion_radius**: type_energies and atoms in model may be indexed differently when ions are not processed into type_energies #1016

Open rwxayheee opened 1 month ago

rwxayheee commented 1 month ago

Hi all,

I have a question regarding function mmtbx.model.model.get_specific_ion_radius, which is used by probe's helper function getExtraAtomInfo for ions found from a model (manager), such as sodium cation (NA), chloride anion (CL), etc.

I noticed something unusual about get_specific_ion_radius which might be causing what I saw in #1011 as this was used in reduce's optimization routine. Here's a short example for the potential problem:

Input Block [1]

# Download PDB File
pdb_token = "1iep"
! curl "http://files.rcsb.org/view/{pdb_token}.pdb" -o "{pdb_token}.pdb"

# Get Model's Hierarchy
from iotbx.data_manager import DataManager

dm = DataManager()
filename_pdb = "1iep.pdb"
dm.process_model_file(filename_pdb)
model = dm.get_model(filename=filename_pdb)
hierarchy = model.get_hierarchy()
print(hierarchy.composition())

Output from [1]

group_args
  n_atoms                        : 4710
  n_chains                       : 6
  n_hd                           : 0
  n_nucleotide                   : 0
  n_nucleotide_atoms             : 0
  n_other                        : 8
  n_other_atoms                  : 80
  n_protein                      : 548
  n_protein_atoms                : 4458
  n_water                        : 172
  n_water_atoms                  : 172
  other_cnts                     : Counter({' CL': 6, 'STI': 2})

Input Block [2]

# Set some Environment Variables...
import os

os.environ["MMTBX_CCP4_MONOMER_LIB"] = "/Users/amyhe/Downloads/geostd"
# This came from https://github.com/phenix-project/geostd.git
os.environ["CLIBD_MON"] = "/Users/amyhe/Downloads/chem_data/mon_lib"
# This came from https://gitlab.com/phenix_project/chem_data

# Make Restraints...
import mmtbx
from mmtbx.hydrogens import reduce_hydrogen

# with reduce pdb interpretation
p = reduce_hydrogen.get_reduce_pdb_interpretation_params(use_neutron_distances=True)
model.process(make_restraints=True, pdb_interpretation_params=p)
print(f"Number of atoms in model._type_energies: {len(model._type_energies)}")
print(set(model._type_energies))

# with default pdb interpretation
p = mmtbx.model.manager.get_default_pdb_interpretation_params()
model.process(make_restraints=True, pdb_interpretation_params=p)
print(f"Number of atoms in model._type_energies: {len(model._type_energies)}")
print(set(model._type_energies))

Output from [2]

Number of atoms in model._type_energies: 4704
{'NH2', 'CR56', 'NC2', 'OC', 'CH1', 'CH3', 'NC1', 'OH1', 'CR6', 'N', 'O', 'C', 'NH1', 'CR16', 'CH2', 'CR15', 'NR15', 'NT', 'NT3', 'S', 'CR5', 'OH2'}
Number of atoms in model._type_energies: 4704
{'NH2', 'CR56', 'NC2', 'OC', 'CH1', 'CH3', 'NC1', 'OH1', 'CR6', 'N', 'O', 'C', 'NH1', 'CR16', 'CH2', 'CR15', 'NR15', 'NT', 'NT3', 'S', 'CR5', 'OH2'}

From here I found that the 6 CL atoms never entered model._type_energies with either interpretation (all of them are correctly tagged as Ion), and as a result, the CL ions get incorrect types because type_energies and atoms are indexed differently. But I'm a bit lost with how to repair the type_energies atom types, or to make i_seq in sync with the atom types.

Many thanks for looking into this issue and your kind advice. Please let me know if you need any additional details or if there’s anything else I can do to help resolve it.

nwmoriarty commented 1 month ago

Amy

When I run you code in an up-to-date copy of Phenix, I get

group_args n_atoms : 4710 n_chains : 6 n_hd : 0 n_nucleotide : 0 n_nucleotide_atoms : 0 n_other : 8 n_other_atoms : 80 n_protein : 548 n_protein_atoms : 4458 n_water : 172 n_water_atoms : 172 other_cnts : Counter({' CL': 6, 'STI': 2}) Number of atoms in model._type_energies: 4710 {'OC', 'NC1', 'NT3', 'CH3', 'NT', 'CL', 'O', 'CR15', 'N', 'CR56', 'S', 'NH2', 'OH2', 'OH1', 'CR6', 'CH2', 'CH1', 'CR5', 'NR15', 'C', 'NC2', 'CR16', 'NH1'} Number of atoms in model._type_energies: 4710 {'OC', 'NC1', 'NT3', 'CH3', 'NT', 'CL', 'O', 'CR15', 'N', 'CR56', 'S', 'NH2', 'OH2', 'OH1', 'CR6', 'CH2', 'CH1', 'CR5', 'NR15', 'C', 'NC2', 'CR16', 'NH1'}

which contains the CL atoms.

It's possible it's the reading of ener_lib.cif from the env var but requires more investigation. Can you move the geostd to the same directory as mon_lib?

Cheers

Nigel


Nigel W. Moriarty Building 33R0349, Molecular Biophysics and Integrated Bioimaging Lawrence Berkeley National Laboratory Berkeley, CA 94720-8235 Email : @.*** Web : CCI.LBL.gov ORCID : orcid.org/0000-0001-8857-9464

On Thu, Sep 12, 2024 at 11:31 PM Amy He @.***> wrote:

Hi all,

I have a question regarding function mmtbx.model.model.get_specific_ion_radius ( https://github.com/cctbx/cctbx_project/blob/865e8e3e193df6a8e6ea5f1c36080a7ab8d26d81/mmtbx/model/model.py#L2393), which is used by probe's helper function getExtraAtomInfo( https://github.com/cctbx/cctbx_project/blob/865e8e3e193df6a8e6ea5f1c36080a7ab8d26d81/mmtbx/probe/Helpers.py#L343-L405) for ions found from a model (manager), such as sodium cation (NA), chloride anion (CL), etc.

I noticed something unusual about get_specific_ion_radius which might be causing what I saw in #1011 https://github.com/cctbx/cctbx_project/issues/1011 as this was used in reduce's optimization routine. Here's a short example for the potential problem:

Input Block [1]

Download PDB Filepdb_token = "1iep"

! curl "http://files.rcsb.org/view/{pdb_token}.pdb" -o "{pdb_token}.pdb"

Get Model's Hierarchyfrom iotbx.data_manager import DataManager

dm = DataManager()filename_pdb = "1iep.pdb"dm.process_model_file(filename_pdb)model = dm.get_model(filename=filename_pdb)hierarchy = model.get_hierarchy()print(hierarchy.composition())

Output from [1]

group_args n_atoms : 4710 n_chains : 6 n_hd : 0 n_nucleotide : 0 n_nucleotide_atoms : 0 n_other : 8 n_other_atoms : 80 n_protein : 548 n_protein_atoms : 4458 n_water : 172 n_water_atoms : 172 other_cnts : Counter({' CL': 6, 'STI': 2})

Input Block [2]

Set some Environment Variables...

import os

os.environ["MMTBX_CCP4_MONOMER_LIB"] = "/Users/amyhe/Downloads/geostd"

This came from https://github.com/phenix-project/geostd.git

os.environ["CLIBD_MON"] = "/Users/amyhe/Downloads/chem_data/mon_lib"

This came from https://gitlab.com/phenix_project/chem_data

Make Restraints...

import mmtbx from mmtbx.hydrogens import reduce_hydrogen

with reduce pdb interpretation

p = reduce_hydrogen.get_reduce_pdb_interpretation_params(use_neutron_distances=True) model.process(make_restraints=True, pdb_interpretation_params=p) print(f"Number of atoms in model._type_energies: {len(model._type_energies)}") print(set(model._type_energies))

with default pdb interpretation

p = mmtbx.model.manager.get_default_pdb_interpretation_params() model.process(make_restraints=True, pdb_interpretation_params=p) print(f"Number of atoms in model._type_energies: {len(model._type_energies)}") print(set(model._type_energies))

Output from [2]

Number of atoms in model._type_energies: 4704 {'NH2', 'CR56', 'NC2', 'OC', 'CH1', 'CH3', 'NC1', 'OH1', 'CR6', 'N', 'O', 'C', 'NH1', 'CR16', 'CH2', 'CR15', 'NR15', 'NT', 'NT3', 'S', 'CR5', 'OH2'} Number of atoms in model._type_energies: 4704 {'NH2', 'CR56', 'NC2', 'OC', 'CH1', 'CH3', 'NC1', 'OH1', 'CR6', 'N', 'O', 'C', 'NH1', 'CR16', 'CH2', 'CR15', 'NR15', 'NT', 'NT3', 'S', 'CR5', 'OH2'}

From here I found that the 6 CL atoms never entered model._type_energies with either interpretation (all of them are correctly tagged as Ion), and as a result, the CL ions get incorrect types because type_energies and atoms are indexed differently. But I'm a bit lost with how to repair the type_energies atom types, or to make i_seq in sync with the atom types.

Many thanks for looking into this issue and your kind advice. Please let me know if you need any additional details or if there’s anything else I can do to help resolve it.

— Reply to this email directly, view it on GitHub https://github.com/cctbx/cctbx_project/issues/1016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHYQPDAQ5DVWEWSGTCWEVTZWKBDNAVCNFSM6AAAAABOEWCFUOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUZDGOJYGI4DCMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rwxayheee commented 1 month ago

Hi @nwmoriarty

Thanks so much for your kind reply, sorry for my late response. I tried different ways of manipulating the monomer library but they didn't work.. I'm a bit unfamiliar with the code base or any version change of the libraries, but I have a walkaround using the ad_hoc_single_atom_residue class in pdb_interpretation.py -

The cause of the problem was - When monomer mapping returns None - the ith item corresponding to atoms[i_seq] in type_energies and type_h_bonds are skipped and may lead to the different indexing and change in lengths of model._type_energies and model._type_h_bonds. If the indexes are to be reused (for example by mmtbx.model.model.get_specific_ion_radius), the length needs to stay the same (at least with placeholders like None or False).

The walkaround is - A CL (chloride ion) is interpreted as an ad_hoc_single_atom_residue. The energy_type for its atom can be registered in a similar manner to this: https://github.com/cctbx/cctbx_project/blob/d82c25109f9dc204bbf2fae247c479ab3ca313de/mmtbx/monomer_library/pdb_interpretation.py#L2750-L2751

While nonbonded_energy_type_registry.assign_directly assigns only symbols: https://github.com/cctbx/cctbx_project/blob/d82c25109f9dc204bbf2fae247c479ab3ca313de/mmtbx/monomer_library/pdb_interpretation.py#L954-L955 I use ad_hoc.energy_type to assign energy_type. I don't know a quick way to get the h_bond_type though, but with the exception of CL and F, all other ad_hoc_single_atom_residues would have the type N.

For future reference - I'm using exactly this ener_lib.cif from this repository: https://github.com/phenix-project/geostd.git

There seem to be some version differences when I install cctbx-base by conda/mamba with python=3.10 vs. python=3.12. Here's how I installed cctbx-base (for the issue I reported): micromamba create -n cctbx-nightly -c cctbx-nightly -c conda-forge cctbx-base python=3.12

I can close this issue if this is not reproducible with the full copy of Phenix. Thanks again for your time and kind advice, as always.