Open rwxayheee opened 1 month ago
Amy
When I run you code in an up-to-date copy of Phenix, I get
group_args n_atoms : 4710 n_chains : 6 n_hd : 0 n_nucleotide : 0 n_nucleotide_atoms : 0 n_other : 8 n_other_atoms : 80 n_protein : 548 n_protein_atoms : 4458 n_water : 172 n_water_atoms : 172 other_cnts : Counter({' CL': 6, 'STI': 2}) Number of atoms in model._type_energies: 4710 {'OC', 'NC1', 'NT3', 'CH3', 'NT', 'CL', 'O', 'CR15', 'N', 'CR56', 'S', 'NH2', 'OH2', 'OH1', 'CR6', 'CH2', 'CH1', 'CR5', 'NR15', 'C', 'NC2', 'CR16', 'NH1'} Number of atoms in model._type_energies: 4710 {'OC', 'NC1', 'NT3', 'CH3', 'NT', 'CL', 'O', 'CR15', 'N', 'CR56', 'S', 'NH2', 'OH2', 'OH1', 'CR6', 'CH2', 'CH1', 'CR5', 'NR15', 'C', 'NC2', 'CR16', 'NH1'}
which contains the CL atoms.
It's possible it's the reading of ener_lib.cif from the env var but requires more investigation. Can you move the geostd to the same directory as mon_lib?
Cheers
Nigel
Nigel W. Moriarty Building 33R0349, Molecular Biophysics and Integrated Bioimaging Lawrence Berkeley National Laboratory Berkeley, CA 94720-8235 Email : @.*** Web : CCI.LBL.gov ORCID : orcid.org/0000-0001-8857-9464
On Thu, Sep 12, 2024 at 11:31 PM Amy He @.***> wrote:
Hi all,
I have a question regarding function mmtbx.model.model.get_specific_ion_radius ( https://github.com/cctbx/cctbx_project/blob/865e8e3e193df6a8e6ea5f1c36080a7ab8d26d81/mmtbx/model/model.py#L2393), which is used by probe's helper function getExtraAtomInfo( https://github.com/cctbx/cctbx_project/blob/865e8e3e193df6a8e6ea5f1c36080a7ab8d26d81/mmtbx/probe/Helpers.py#L343-L405) for ions found from a model (manager), such as sodium cation (NA), chloride anion (CL), etc.
I noticed something unusual about get_specific_ion_radius which might be causing what I saw in #1011 https://github.com/cctbx/cctbx_project/issues/1011 as this was used in reduce's optimization routine. Here's a short example for the potential problem:
Input Block [1]
Download PDB Filepdb_token = "1iep"
! curl "http://files.rcsb.org/view/{pdb_token}.pdb" -o "{pdb_token}.pdb"
Get Model's Hierarchyfrom iotbx.data_manager import DataManager
dm = DataManager()filename_pdb = "1iep.pdb"dm.process_model_file(filename_pdb)model = dm.get_model(filename=filename_pdb)hierarchy = model.get_hierarchy()print(hierarchy.composition())
Output from [1]
group_args n_atoms : 4710 n_chains : 6 n_hd : 0 n_nucleotide : 0 n_nucleotide_atoms : 0 n_other : 8 n_other_atoms : 80 n_protein : 548 n_protein_atoms : 4458 n_water : 172 n_water_atoms : 172 other_cnts : Counter({' CL': 6, 'STI': 2})
Input Block [2]
Set some Environment Variables...
import os
os.environ["MMTBX_CCP4_MONOMER_LIB"] = "/Users/amyhe/Downloads/geostd"
This came from https://github.com/phenix-project/geostd.git
os.environ["CLIBD_MON"] = "/Users/amyhe/Downloads/chem_data/mon_lib"
This came from https://gitlab.com/phenix_project/chem_data
Make Restraints...
import mmtbx from mmtbx.hydrogens import reduce_hydrogen
with reduce pdb interpretation
p = reduce_hydrogen.get_reduce_pdb_interpretation_params(use_neutron_distances=True) model.process(make_restraints=True, pdb_interpretation_params=p) print(f"Number of atoms in model._type_energies: {len(model._type_energies)}") print(set(model._type_energies))
with default pdb interpretation
p = mmtbx.model.manager.get_default_pdb_interpretation_params() model.process(make_restraints=True, pdb_interpretation_params=p) print(f"Number of atoms in model._type_energies: {len(model._type_energies)}") print(set(model._type_energies))
Output from [2]
Number of atoms in model._type_energies: 4704 {'NH2', 'CR56', 'NC2', 'OC', 'CH1', 'CH3', 'NC1', 'OH1', 'CR6', 'N', 'O', 'C', 'NH1', 'CR16', 'CH2', 'CR15', 'NR15', 'NT', 'NT3', 'S', 'CR5', 'OH2'} Number of atoms in model._type_energies: 4704 {'NH2', 'CR56', 'NC2', 'OC', 'CH1', 'CH3', 'NC1', 'OH1', 'CR6', 'N', 'O', 'C', 'NH1', 'CR16', 'CH2', 'CR15', 'NR15', 'NT', 'NT3', 'S', 'CR5', 'OH2'}
From here I found that the 6 CL atoms never entered model._type_energies with either interpretation (all of them are correctly tagged as Ion), and as a result, the CL ions get incorrect types because type_energies and atoms are indexed differently. But I'm a bit lost with how to repair the type_energies atom types, or to make i_seq in sync with the atom types.
Many thanks for looking into this issue and your kind advice. Please let me know if you need any additional details or if there’s anything else I can do to help resolve it.
— Reply to this email directly, view it on GitHub https://github.com/cctbx/cctbx_project/issues/1016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHYQPDAQ5DVWEWSGTCWEVTZWKBDNAVCNFSM6AAAAABOEWCFUOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUZDGOJYGI4DCMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @nwmoriarty
Thanks so much for your kind reply, sorry for my late response. I tried different ways of manipulating the monomer library but they didn't work.. I'm a bit unfamiliar with the code base or any version change of the libraries, but I have a walkaround using the ad_hoc_single_atom_residue class in pdb_interpretation.py -
The cause of the problem was -
When monomer mapping returns None - the ith item corresponding to atoms[i_seq]
in type_energies and type_h_bonds are skipped and may lead to the different indexing and change in lengths of model._type_energies
and model._type_h_bonds
. If the indexes are to be reused (for example by mmtbx.model.model.get_specific_ion_radius), the length needs to stay the same (at least with placeholders like None
or False
).
The walkaround is -
A CL (chloride ion) is interpreted as an ad_hoc_single_atom_residue
. The energy_type
for its atom can be registered in a similar manner to this:
https://github.com/cctbx/cctbx_project/blob/d82c25109f9dc204bbf2fae247c479ab3ca313de/mmtbx/monomer_library/pdb_interpretation.py#L2750-L2751
While nonbonded_energy_type_registry.assign_directly
assigns only symbols:
https://github.com/cctbx/cctbx_project/blob/d82c25109f9dc204bbf2fae247c479ab3ca313de/mmtbx/monomer_library/pdb_interpretation.py#L954-L955
I use ad_hoc.energy_type
to assign energy_type
. I don't know a quick way to get the h_bond_type
though, but with the exception of CL and F, all other ad_hoc_single_atom_residues would have the type N
.
For future reference - I'm using exactly this ener_lib.cif from this repository: https://github.com/phenix-project/geostd.git
There seem to be some version differences when I install cctbx-base by conda/mamba with python=3.10 vs. python=3.12. Here's how I installed cctbx-base (for the issue I reported):
micromamba create -n cctbx-nightly -c cctbx-nightly -c conda-forge cctbx-base python=3.12
I can close this issue if this is not reproducible with the full copy of Phenix. Thanks again for your time and kind advice, as always.
Hi all,
I have a question regarding function mmtbx.model.model.get_specific_ion_radius, which is used by probe's helper function getExtraAtomInfo for ions found from a model (manager), such as sodium cation (NA), chloride anion (CL), etc.
I noticed something unusual about get_specific_ion_radius which might be causing what I saw in #1011 as this was used in reduce's optimization routine. Here's a short example for the potential problem:
Input Block [1]
Output from [1]
Input Block [2]
Output from [2]
From here I found that the 6 CL atoms never entered
model._type_energies
with either interpretation (all of them are correctly tagged as Ion), and as a result, the CL ions get incorrect types because type_energies and atoms are indexed differently. But I'm a bit lost with how to repair the type_energies atom types, or to make i_seq in sync with the atom types.Many thanks for looking into this issue and your kind advice. Please let me know if you need any additional details or if there’s anything else I can do to help resolve it.