forlilab / Meeko

Interfacing RDKit and AutoDock
GNU Lesser General Public License v2.1
192 stars 48 forks source link

Minor edit to atom_name_to_molsetup_index; Support input in both ways: --macromol (ProDy) and --pdb (RDKit) #172

Closed rwxayheee closed 1 week ago

rwxayheee commented 1 week ago

This fixed #168 without further changes related to ProDy.

Before (uses names from RDKit's GetPDBResidueInfo(), for --pdb only):

def atom_name_to_molsetup_index(chorizo_residue, atom_name):
    indices = []
    for atom in chorizo_residue.raw_rdkit_mol.GetAtoms():
        name = atom.GetPDBResidueInfo().GetName().strip()
        if name == atom_name:
            indices.append(atom.GetIdx())
    if len(indices) > 1:
        raise RuntimeError(f"multiple atoms matched query atom name {atom_name}")
    if len(indices) == 0:
        return None
    index = indices[0]
    index = chorizo_residue.mapidx_from_raw[index]

After (uses names from parameterized chorizo residue):

def atom_name_to_molsetup_index(chorizo_residue, atom_name):

    # get matched indices from parameterized chorizo_residue
    indices = [index for index, name in enumerate(chorizo_residue.atom_names) if name == atom_name]

    if len(indices) > 1:
        raise RuntimeError(f"multiple atoms matched query atom name {atom_name}")
    if len(indices) == 0:
        return None

    index = indices[0]
diogomart commented 1 week ago

In this way, the atom name that will be used is the one in the templates. This has two problems:

  1. it may confuse the user if the atom name in the input file differs from the one in the template
  2. in the future, we might want templates without associated atom names

I think we should fix the prody parser instead.

rwxayheee commented 1 week ago

Hi @diogomart Ok. Do you mean we should always use names associated with the "raw" (input) mols per residue? I can look at the details (prodyutils.py?) a little later today. From a quick glance I'm a bit confused why .GetPDBResidueInfo().GetName() isn't working with ProDy

diogomart commented 1 week ago

Do you mean we should always use names associated with the "raw" (input) mols per residue?

I think yes if the goal is to have the user specify an atom. Likely the user looked at the system in a molecular visualizer, and is using the atom name associated with the input file (pdb/mmcif).