box.toml file generation

mukherjeesutanu commented 1 year ago

How to get a box.toml file for a particular protein of interest surrounding the whole protein?

jvsguerra commented 1 year ago

Hi @mukherjeesutanu,

To detect cavities throughout the whole protein in pyKVFinder, you do not need to define a box.toml file because the default detection already does it automatically. Here's an example code snippet to illustrate how to do it:

import os
import pyKVFinder

# Define detection parameters
step = 0.6
probe_out = 4.0
pdb = os.path.join(os.path.dirname(pyKVFinder.__file__), 'data', 'tests', '1FMO.pdb')

# Read atomic information about 1FMO
atomic = pyKVFinder.read_pdb(pdb)

# Get vertices of the whole protein
vertices = pyKVFinder.get_vertices(atomic, probe_out=probe_out, step=step)
print(vertices)
[[-19.911 -32.125 -30.806]
 [ 40.188 -32.125 -30.806]
 [-19.911  43.446 -30.806]
 [-19.911 -32.125  27.352]]

# Detect cavities
ncav, cavities = pyKVFinder.detect(atomic, vertices, step=step, probe_out=probe_out)

# Spatial characterization
surface, volume, area = pyKVFinder.spatial(cavities, step=step)
print(volume)
{'KAA': 137.16, 'KAB': 47.52, 'KAC': 66.96, 'KAD': 8.21, 'KAE': 43.63, 'KAF': 12.53, 'KAG': 6.26, 'KAH': 520.13, 'KAI': 12.31, 'KAJ': 26.57, 'KAK': 12.31, 'KAL': 33.91, 'KAM': 23.11, 'KAN': 102.82, 'KAO': 6.05, 'KAP': 15.55, 'KAQ': 7.99, 'KAR': 7.78} == {'KAA': 137.16, 'KAB': 47.52, 'KAC': 66.96, 'KAD': 8.21, 'KAE': 43.63, 'KAF': 12.53, 'KAG': 6.26, 'KAH': 520.13, 'KAI': 12.31, 'KAJ': 26.57, 'KAK': 12.31, 'KAL': 33.91, 'KAM': 23.11, 'KAN': 102.82, 'KAO': 6.05, 'KAP': 15.55, 'KAQ': 7.99, 'KAR': 7.78}

Now, if you want to get a box.toml file for a particular protein of interest surrounding the whole protein, you can use the vertices from the above detection to define the box. Here is the box.toml file to be used:

[box]
p1 = [-19.911, -32.125, -30.806,]
p2 = [40.188, -32.125, -30.806,]
p3 = [-19.911, 43.446, -30.806,]
p4 = [-19.911, -32.125, 27.352,]

Then, you can use the box.toml file to detect cavities inside the box as follows:

# Define detection parameters
step = 0.6
probe_in = 1.4
probe_out = 4.0
pdb = os.path.join(os.path.dirname(pyKVFinder.__file__), 'data', 'tests', '1FMO.pdb')
box = "box.toml"

# Read atomic information about 1FMO
atomic = pyKVFinder.read_pdb(pdb)

# Get vertices from a custom box and select atoms inside the box
vertices, atomic = pyKVFinder.get_vertices_from_file("box.toml", atomic, step=step, probe_in=0.0, probe_out=0.0)
# Since you are defining a box from a whole protein detection, this custom box has already been padded by the probe_out. So, in this scenario, you do not need to add Probe Out or Probe In. However, if you define the `box.toml` directly from residues or box coordinates, you must add Probe Out and Probe In here!
print(vertices)
[[-19.911 -32.125 -30.806]
 [ 40.188 -32.125 -30.806]
 [-19.911  43.446 -30.806]
 [-19.911 -32.125  27.352]]

# Detect cavities inside box (box_adjustment=True)
ncav, cavities = pyKVFinder.detect(atomic, vertices, step=step, probe_out=probe_out, box_adjustment=True)

# Spatial characterization
surface, volume, area = pyKVFinder.spatial(cavities, step=step)
print(volume)
{'KAA': 137.16, 'KAB': 47.52, 'KAC': 66.96, 'KAD': 8.21, 'KAE': 43.63, 'KAF': 12.53, 'KAG': 6.26, 'KAH': 520.13, 'KAI': 12.31, 'KAJ': 26.57, 'KAK': 12.31, 'KAL': 33.91, 'KAM': 23.11, 'KAN': 102.82, 'KAO': 6.05, 'KAP': 15.55, 'KAQ': 7.99, 'KAR': 7.78} == {'KAA': 137.16, 'KAB': 47.52, 'KAC': 66.96, 'KAD': 8.21, 'KAE': 43.63, 'KAF': 12.53, 'KAG': 6.26, 'KAH': 520.13, 'KAI': 12.31, 'KAJ': 26.57, 'KAK': 12.31, 'KAL': 33.91, 'KAM': 23.11, 'KAN': 102.82, 'KAO': 6.05, 'KAP': 15.55, 'KAQ': 7.99, 'KAR': 7.78}

Furthermore, an explanation on how to write the Box configuration file for cavity detection inside a custom box can be found here.

I hope this helps! Let me know if you have any further questions.

Issue-37.zip

mukherjeesutanu commented 1 year ago

Thank you @jvsguerra for the detailed solution.

--Sutanu

mukherjeesutanu commented 1 year ago

I have a different issue while reading the pdb file using "pyKVFinder.read_pdb",

pdb = 'conf-b.pdb' atomic = pyKVFinder.read_pdb(pdb)

I got the following error message:

KeyError Traceback (most recent call last) Input In [12], in <cell line: 1>() ----> 1 atomic = pyKVFinder.read_pdb(pdb)

File ~/anaconda3/lib/python3.9/site-packages/pyKVFinder/utils.py:205, in read_pdb(fn, vdw, model) 203 if keep: 204 if line[:4] == "ATOM" or line[:6] == "HETATM": --> 205 atomic.append(_process_pdb_line(line, vdw)) 207 return numpy.asarray(atomic)

File ~/anaconda3/lib/python3.9/site-packages/pyKVFinder/utils.py:129, in _process_pdb_line(line, vdw) 127 radius = vdw[resname][atom] 128 else: --> 129 radius = vdw["GEN"][atom_symbol] 130 logging.info( 131 f"Warning: Atom {atom} of residue {resname} \ 132 not found in dictionary." 133 ) 134 logging.info( 135 f"Warning: Using generic atom {atom_symbol} \ 136 radius: {radius} \u00c5." 137 )

KeyError: ''

@jvsguerra could you please help me out with this issue, thanks. I have attached the pdb file in txt format here as pdb isn't supported. conf-b.txt

jvsguerra commented 1 year ago

Hi, @mukherjeesutanu!

It looks like you encountered a KeyError when trying to access an atom in the van der Waals (vdW) dictionary that doesn't exist. This can happen if the atom symbol in your PDB file is missing or the atom name is in the wrong format.

The vdW dictionary contains information about standard residues, but in some cases, the 'ATOM' information is not of a standard residue (eg non-standard residue or nucleic acid), or the atom name is in a different format. When this happens, pyKVFinder uses generic values for vdW radii. In your case, the error was caused by an empty atom symbol (columns 77-78) being passed as the key.

To fix this issue, you have a couple of options:

1) Add the missing atom symbols to your PDB file.

You can download a corrected version of your PDB file (conf-b-corrected.zip) that includes the missing atom symbol. Alternatively, you can edit your PDB file to add the missing atom symbol yourself.

For example, your original PDB file had this line:

ATOM      1  N   SER   153      42.990  56.590  38.180  1.00  0.00

But in the corrected version, the atom symbol has been added:

ATOM      1  N   SER B 153      42.990  56.590  38.180  1.00  0.00           N

2) Rewrite any residues that are in the wrong format.

Sometimes, pyKVFinder will give you warnings about atoms that are not found in the vdW dictionary. These warnings will appear in the KVFinder.log file. If you see warnings like this, it means that the atom names in your PDB file are in the wrong format and need to be rewritten.

For example, your PDB file gave these warnings:

Warning: Atom HD1 of residue ILE not found in dictionary.
Warning: Using generic atom H radius: 0.91 Å.
Warning: Atom HD2 of residue ILE not found in dictionary.
Warning: Using generic atom H radius: 0.91 Å.
Warning: Atom HD3 of residue ILE not found in dictionary.
Warning: Using generic atom H radius: 0.91 Å.
Warning: Atom HE2 of residue HIS not found in dictionary.
Warning: Using generic atom H radius: 0.91 Å.

Your original PDB file had this line:

ATOM     28  HD1 ILE   154      42.790  62.480  41.120  1.00  0.00

To correct it, the atom name should be changed for these residues in your PDB file:

ATOM     28 HD11 ILE    154      42.790  62.480  41.120  1.00  0.00

In addition to these issues, your PDB file is also missing the chain identifier symbol. It does not pose a problem here, but might be in the pyKVFinder.constitutional for example if you have chains with the same atom numbering.

I hope this helps! If you have any questions or concerns, please don't hesitate to ask.

jvsguerra commented 1 year ago

@mukherjeesutanu, I am closing this issue for now.

If you have any further questions or concerns, please feel free to re-open it and provide any additional information or open a new issue.

LBC-LNBio / pyKVFinder

box.toml file generation #37