Closed hrp1000 closed 2 months ago
Hi @hrp1000,
Based on your description, I propose two methods to obtain your cavity coordinates:
When you use pyKVFinder.detect
, you get a cavity grid with:
-1
: solvent0
: biomolecule1
: empty space (cavities that do not reach volume cutoff)>2
cavitiesUsing numpy operations, you can determine each cavity's center and limits (xyz min and max coordinates). Then, you can adapt pyKVFinder.export to produce the cavity coordinates as a PDB file or replicate pyKVFinder.export operations to convert the grid to 3D Cartesian coordinates.
After detecting the cavities, identify the interface residues surrounding each cavity using pyKVFinder.constitutional. Combining these residues with atomic
information, you can obtain the 3D coordinates of the atoms composing these residues. From these coordinates, calculate the center and the minimum and maximum coordinates for each cavity.
Just one question, is there a specific format that AutoDock Vina uses?
I believe the second approach is simpler than the first. I'm running some tests on my end and can share the results with you.
Thanks for the rapid response.
I think that either option is more complicated than I was after. After all, you have already written code to extract the cavities etc for me!
I'm currently running pyKVFinder.workflow because that's easy to do and I can get the information that I need.
However, pyKVFinder.export will always produce a file, and I can read that and extract the information about each cavity - but then I delete the file which just seems a messier way to do things.
I would prefer to be able to store the output of export in a variable (e.g. a list or a string) that I can then parse easily, without having to produce a temporary file.
Apropos Autodoc Vina - it uses a modified PDB format called "PDBQT" that replaces items on each ATOM (or HETATM) line after the coordinates with partial charge ("Gasteiger charges") information.
If I understand correctly, you want to store the 3D coordinates of the cavity points from the pyKVFinder.run_workflow
output (pyKVFinder.pyKVFindeResults
). So, I have worked on a script to convert the cavity grid (results.cavities
) to cavity coordinates.
Here's the script (approach 1-ish):
import os
from typing import Dict
import numpy
import pyKVFinder
# Protein to test script: 1FMO.pdb
TESTFILE = os.path.join(
os.path.dirname(pyKVFinder.__file__), "data", "tests", "1FMO.pdb"
)
def _grid2indexes(cavities: numpy.ndarray, cavnum: int) -> numpy.ndarray:
indexes = numpy.argwhere(cavities == cavnum)
return indexes
def _indexes2coord(
indexes: numpy.ndarray, step: float, vertices: numpy.ndarray
) -> numpy.ndarray:
from pyKVFinder.grid import _get_sincos
# P1, P2, P3, P4 (origin, x-axis, y-axis, z-axis)
P1, P2, P3, P4 = vertices
# Calculate sin and cos for each axis
sincos = _get_sincos(vertices)
# Convert grid to 3D Cartesian coordinates
xaux, yaux, zaux = (indexes * step).T
x = (
(xaux * sincos[3])
+ (yaux * sincos[0] * sincos[2])
- (zaux * sincos[1] * sincos[2])
+ P1[0]
)
y = (yaux * sincos[1]) + (zaux * sincos[0]) + P1[1]
z = (
(xaux * sincos[2])
- (yaux * sincos[0] * sincos[3])
+ (zaux * sincos[1] * sincos[3])
+ P1[2]
)
# Prepare 3D coordinates
coords = numpy.array([x, y, z]).T
return coords
def grid2coords(results: pyKVFinder.pyKVFinderResults) -> dict:
# Prepare dictionary to store cavities coordinates
cavities_coords = {key: [] for key in results.residues.keys()}
for cavnum, key in enumerate(cavities_coords, start=2):
indexes = _grid2indexes(results.cavities, cavnum)
coords = _indexes2coord(indexes, results._step, results._vertices)
cavities_coords[key] = coords
return cavities_coords
def coords2pdb(coords: Dict[str, numpy.ndarray], filename: str = "cavity.pdb") -> None:
with open(filename, "w") as f:
i = 0
for key, coords in coords.items():
for coord in coords:
i += 1
f.write(
"ATOM {:5d} H {:3s} 259 {:8.3f}{:8.3f}{:8.3f} 1.00 0.00 \n".format(
i, key, coord[0], coord[1], coord[2]
)
)
def run(pdb: str = TESTFILE):
# Detection and characterization (volume, area, depth, hydropathy, interface residues)
# of cavities in pdb file.
# -> results <pyKVFinderResults object> has the following attributes:
# - cavities: 3D grid of the cavities
# - residues: list of residues in the protein
# - depth: 3D grid of the depth of each residue
# - hydropathy: 3D grid of the hydropathy of each residue
results = pyKVFinder.run_workflow(
pdb, include_depth=True, include_hydropathy=True
)
# Cavities grid to 3D coordinates
cavities_coords = grid2coords(results)
# Get cavities center and limits
center = [{key: value.mean(axis=0)} for key, value in cavities_coords.items()]
minmax = {
key: [ # KAA, KAB, ...
value.min(axis=0), # [xmin, ymin, zmin]
value.max(axis=0), # [xmax, ymax, zmax]
]
for key, value in cavities_coords.items()
}
# If you wish, you can save cavities to a PDB file
coords2pdb(cavities_coords, "cavity.pdb")
return cavities_coords, center, minmax
if __name__ == "__main__":
coords, center, minmax = run()
Let me know if this works for you.
As an alternative approach, I have worked on a script that uses the interface residues to obtain their atomic information, including 3D coordinates. These interface residues are found by pyKVFinder.constitutional
and stored in results.residues
from the pyKVFinder.run_workflow
output (pyKVFinder.pyKVFindeResults
). In this approach, instead of having the cavity coordinates, we use the coordinates of the interface residues surrounding the cavity.
Here's the script (approach 2):
import os
from typing import List, Dict
import numpy
import pyKVFinder
# Protein to test script: 1FMO.pdb
TESTFILE = os.path.join(
os.path.dirname(pyKVFinder.__file__), "data", "tests", "1FMO.pdb"
)
def _get_atomic_information(
residues: Dict[str, List[str]], cavtag: str, atomic: numpy.ndarray
) -> numpy.ndarray:
# Get atomic information from residues
resatomic = numpy.array(["_".join(item[0:3]) for item in residues[cavtag]])
# Extract atominfo from atomic
atominfo = numpy.asarray(
([[f"{atom[0]}_{atom[1]}_{atom[2]}", atom[3]] for atom in atomic[:, :4]])
)
# Get coordinates of residues
indexes = numpy.in1d(atominfo[:, 0], resatomic)
return atomic[indexes]
def res2atomic(results: pyKVFinder.pyKVFinderResults, atomic: numpy.ndarray) -> Dict[str, numpy.ndarray]:
# Prepare dictionary to store residues coordinates
residues_coords = {key: [] for key in results.residues.keys()}
for cavtag in residues_coords.keys():
# Get coordinates of residues
residues_coords[cavtag] = _get_atomic_information(
results.residues, cavtag, atomic
)
return residues_coords
def rescoords2pdb(
coords: Dict[str, numpy.ndarray], filename: str = "cavity.pdb"
) -> None:
with open(filename, "w") as f:
i = 0
for key, coords in coords.items():
for coord in coords:
i += 1
f.write(
"ATOM {:5d} {:3s} {:3s} {:1s}{:4d} {:8.3f}{:8.3f}{:8.3f} 1.00100.00 \n".format(
i,
str(coord[3]),
str(coord[2]),
key[-1],
int(coord[0]),
float(coord[4]),
float(coord[5]),
float(coord[6]),
)
)
def run(pdb: str = TESTFILE):
# Detection and characterization (volume, area, depth, hydropathy, interface residues)
# of cavities in pdb file.
# -> results <pyKVFinderResults object> has the following attributes:
# - cavities: 3D grid of the cavities
# - residues: list of residues in the protein
# - depth: 3D grid of the depth of each residue
# - hydropathy: 3D grid of the hydropathy of each residue
# NOTE: ignore_backbone (default=False) changes which interface residues are
# identified as surrounding cavities.
results = pyKVFinder.run_workflow(
pdb, include_depth=True, include_hydropathy=True, ignore_backbone=False
)
# Get atomic information from pdb file
atomic = pyKVFinder.read_pdb(pdb)
# Residues list to 3D coordinates
residues_coords = res2atomic(results, atomic)
# Get cavities center and limits
center = [
{key: value[:, 4:7].astype(float).mean(axis=0)}
for key, value in residues_coords.items()
]
minmax = {
key: [ # KAA, KAB, ...
value[:, 4:7].astype(float).min(axis=0), # [xmin, ymin, zmin]
value[:, 4:7].astype(float).max(axis=0), # [xmax, ymax, zmax]
]
for key, value in residues_coords.items()
}
# If you wish, you can save cavities to a PDB file
rescoords2pdb(residues_coords, "residues.pdb")
return residues_coords, center, minmax
if __name__ == "__main__":
residues_coords, center, minmax = run()
Hi João - the "1-ish" solution is exactly what I wanted - many thanks! Also the code for both examples is very useful in helping me to follow what's going on in pyKVFinder.
Problem
I need to extract the overall dimensions and centre of each cavity so that I can give them to Autodoc Vina for docking a ligand. An easy way would be to export to "cavity.pdb", then parse the file, but this isn't very satisfying. There must be a way to do this without writing a "cavity.pdb" file then reading it back, but I haven't found anything in the documentation that tells me how to do this.
Proposed solution
My guess is that I've missed something...