LBC-LNBio / pyKVFinder

pyKVFinder: Python-C parallel KVFinder
https://lbc-lnbio.github.io/pyKVFinder/
GNU General Public License v3.0
19 stars 9 forks source link

Extract cavity "ATOM" coordinates for use within a program #110

Closed hrp1000 closed 2 months ago

hrp1000 commented 2 months ago

Problem

I need to extract the overall dimensions and centre of each cavity so that I can give them to Autodoc Vina for docking a ligand. An easy way would be to export to "cavity.pdb", then parse the file, but this isn't very satisfying. There must be a way to do this without writing a "cavity.pdb" file then reading it back, but I haven't found anything in the documentation that tells me how to do this.

Proposed solution

My guess is that I've missed something...

jvsguerra commented 2 months ago

Hi @hrp1000,

Based on your description, I propose two methods to obtain your cavity coordinates:

Cavity grid

When you use pyKVFinder.detect, you get a cavity grid with:

Using numpy operations, you can determine each cavity's center and limits (xyz min and max coordinates). Then, you can adapt pyKVFinder.export to produce the cavity coordinates as a PDB file or replicate pyKVFinder.export operations to convert the grid to 3D Cartesian coordinates.

Interface residues

After detecting the cavities, identify the interface residues surrounding each cavity using pyKVFinder.constitutional. Combining these residues with atomic information, you can obtain the 3D coordinates of the atoms composing these residues. From these coordinates, calculate the center and the minimum and maximum coordinates for each cavity.

Just one question, is there a specific format that AutoDock Vina uses?

I believe the second approach is simpler than the first. I'm running some tests on my end and can share the results with you.

hrp1000 commented 2 months ago

Thanks for the rapid response.

I think that either option is more complicated than I was after. After all, you have already written code to extract the cavities etc for me!

I'm currently running pyKVFinder.workflow because that's easy to do and I can get the information that I need.

However, pyKVFinder.export will always produce a file, and I can read that and extract the information about each cavity - but then I delete the file which just seems a messier way to do things.

I would prefer to be able to store the output of export in a variable (e.g. a list or a string) that I can then parse easily, without having to produce a temporary file.

Apropos Autodoc Vina - it uses a modified PDB format called "PDBQT" that replaces items on each ATOM (or HETATM) line after the coordinates with partial charge ("Gasteiger charges") information.

jvsguerra commented 2 months ago

If I understand correctly, you want to store the 3D coordinates of the cavity points from the pyKVFinder.run_workflow output (pyKVFinder.pyKVFindeResults). So, I have worked on a script to convert the cavity grid (results.cavities) to cavity coordinates.

Here's the script (approach 1-ish):

import os
from typing import Dict

import numpy
import pyKVFinder

# Protein to test script: 1FMO.pdb
TESTFILE = os.path.join(
    os.path.dirname(pyKVFinder.__file__), "data", "tests", "1FMO.pdb"
)

def _grid2indexes(cavities: numpy.ndarray, cavnum: int) -> numpy.ndarray:
    indexes = numpy.argwhere(cavities == cavnum)
    return indexes

def _indexes2coord(
    indexes: numpy.ndarray, step: float, vertices: numpy.ndarray
) -> numpy.ndarray:
    from pyKVFinder.grid import _get_sincos

    # P1, P2, P3, P4 (origin, x-axis, y-axis, z-axis)
    P1, P2, P3, P4 = vertices

    # Calculate sin and cos for each axis
    sincos = _get_sincos(vertices)

    # Convert grid to 3D Cartesian coordinates
    xaux, yaux, zaux = (indexes * step).T

    x = (
        (xaux * sincos[3])
        + (yaux * sincos[0] * sincos[2])
        - (zaux * sincos[1] * sincos[2])
        + P1[0]
    )
    y = (yaux * sincos[1]) + (zaux * sincos[0]) + P1[1]
    z = (
        (xaux * sincos[2])
        - (yaux * sincos[0] * sincos[3])
        + (zaux * sincos[1] * sincos[3])
        + P1[2]
    )

    # Prepare 3D coordinates
    coords = numpy.array([x, y, z]).T

    return coords

def grid2coords(results: pyKVFinder.pyKVFinderResults) -> dict:
    # Prepare dictionary to store cavities coordinates
    cavities_coords = {key: [] for key in results.residues.keys()}

    for cavnum, key in enumerate(cavities_coords, start=2):
        indexes = _grid2indexes(results.cavities, cavnum)
        coords = _indexes2coord(indexes, results._step, results._vertices)
        cavities_coords[key] = coords

    return cavities_coords

def coords2pdb(coords: Dict[str, numpy.ndarray], filename: str = "cavity.pdb") -> None:
    with open(filename, "w") as f:
        i = 0
        for key, coords in coords.items():
            for coord in coords:
                i += 1
                f.write(
                    "ATOM  {:5d}  H   {:3s}   259    {:8.3f}{:8.3f}{:8.3f}  1.00  0.00            \n".format(
                        i, key, coord[0], coord[1], coord[2]
                    )
                )

def run(pdb: str = TESTFILE):
    # Detection and characterization (volume, area, depth, hydropathy, interface residues)
    # of cavities in pdb file.
    # -> results <pyKVFinderResults object> has the following attributes:
    #   - cavities: 3D grid of the cavities
    #   - residues: list of residues in the protein
    #   - depth: 3D grid of the depth of each residue
    #   - hydropathy: 3D grid of the hydropathy of each residue
    results = pyKVFinder.run_workflow(
        pdb, include_depth=True, include_hydropathy=True
    )

    # Cavities grid to 3D coordinates
    cavities_coords = grid2coords(results)

    # Get cavities center and limits
    center = [{key: value.mean(axis=0)} for key, value in cavities_coords.items()]
    minmax = {
        key: [  # KAA, KAB, ...
            value.min(axis=0),  # [xmin, ymin, zmin]
            value.max(axis=0),  # [xmax, ymax, zmax]
        ]
        for key, value in cavities_coords.items()
    }

    # If you wish, you can save cavities to a PDB file
    coords2pdb(cavities_coords, "cavity.pdb")

    return cavities_coords, center, minmax

if __name__ == "__main__":
    coords, center, minmax = run()

Let me know if this works for you.

jvsguerra commented 2 months ago

As an alternative approach, I have worked on a script that uses the interface residues to obtain their atomic information, including 3D coordinates. These interface residues are found by pyKVFinder.constitutional and stored in results.residues from the pyKVFinder.run_workflow output (pyKVFinder.pyKVFindeResults). In this approach, instead of having the cavity coordinates, we use the coordinates of the interface residues surrounding the cavity.

Here's the script (approach 2):

import os
from typing import List, Dict

import numpy
import pyKVFinder

# Protein to test script: 1FMO.pdb
TESTFILE = os.path.join(
    os.path.dirname(pyKVFinder.__file__), "data", "tests", "1FMO.pdb"
)

def _get_atomic_information(
    residues: Dict[str, List[str]], cavtag: str, atomic: numpy.ndarray
) -> numpy.ndarray:
    # Get atomic information from residues
    resatomic = numpy.array(["_".join(item[0:3]) for item in residues[cavtag]])

    # Extract atominfo from atomic
    atominfo = numpy.asarray(
        ([[f"{atom[0]}_{atom[1]}_{atom[2]}", atom[3]] for atom in atomic[:, :4]])
    )

    # Get coordinates of residues
    indexes = numpy.in1d(atominfo[:, 0], resatomic)

    return atomic[indexes]

def res2atomic(results: pyKVFinder.pyKVFinderResults, atomic: numpy.ndarray) -> Dict[str, numpy.ndarray]:
    # Prepare dictionary to store residues coordinates
    residues_coords = {key: [] for key in results.residues.keys()}

    for cavtag in residues_coords.keys():
        # Get coordinates of residues
        residues_coords[cavtag] = _get_atomic_information(
            results.residues, cavtag, atomic
        )

    return residues_coords

def rescoords2pdb(
    coords: Dict[str, numpy.ndarray], filename: str = "cavity.pdb"
) -> None:
    with open(filename, "w") as f:
        i = 0
        for key, coords in coords.items():
            for coord in coords:
                i += 1
                f.write(
                    "ATOM  {:5d}  {:3s} {:3s} {:1s}{:4d}    {:8.3f}{:8.3f}{:8.3f}  1.00100.00            \n".format(
                        i,
                        str(coord[3]),
                        str(coord[2]),
                        key[-1],
                        int(coord[0]),
                        float(coord[4]),
                        float(coord[5]),
                        float(coord[6]),
                    )
                )

def run(pdb: str = TESTFILE):
    # Detection and characterization (volume, area, depth, hydropathy, interface residues)
    # of cavities in pdb file.
    # -> results <pyKVFinderResults object> has the following attributes:
    #   - cavities: 3D grid of the cavities
    #   - residues: list of residues in the protein
    #   - depth: 3D grid of the depth of each residue
    #   - hydropathy: 3D grid of the hydropathy of each residue
    # NOTE: ignore_backbone (default=False) changes which interface residues are
    # identified as surrounding cavities.
    results = pyKVFinder.run_workflow(
        pdb, include_depth=True, include_hydropathy=True, ignore_backbone=False
    )

    # Get atomic information from pdb file
    atomic = pyKVFinder.read_pdb(pdb)

    # Residues list to 3D coordinates
    residues_coords = res2atomic(results, atomic)

    # Get cavities center and limits
    center = [
        {key: value[:, 4:7].astype(float).mean(axis=0)}
        for key, value in residues_coords.items()
    ]
    minmax = {
        key: [  # KAA, KAB, ...
            value[:, 4:7].astype(float).min(axis=0),  # [xmin, ymin, zmin]
            value[:, 4:7].astype(float).max(axis=0),  # [xmax, ymax, zmax]
        ]
        for key, value in residues_coords.items()
    }

    # If you wish, you can save cavities to a PDB file
    rescoords2pdb(residues_coords, "residues.pdb")

    return residues_coords, center, minmax

if __name__ == "__main__":
    residues_coords, center, minmax = run()
hrp1000 commented 2 months ago

Hi João - the "1-ish" solution is exactly what I wanted - many thanks! Also the code for both examples is very useful in helping me to follow what's going on in pyKVFinder.