LBC-LNBio / pyKVFinder

pyKVFinder: Python-C parallel KVFinder
https://lbc-lnbio.github.io/pyKVFinder/
GNU General Public License v3.0
19 stars 10 forks source link

Volume vs. Time plot for a particular pocket #16

Closed mukherjeesutanu closed 1 year ago

mukherjeesutanu commented 1 year ago

How to plot the volume/area of a particular cryptic pocket(s) with respect to time? Can we see the changing pocket volume interactively on Jupyter notebook (using nglview)? Thanks.

jvsguerra commented 1 year ago

Hi @mukherjeesutanu,

To plot a volume/area of a particular cryptic pocket, you must perform cavity detection to ensure you are only getting the cavity you want in each frame. For this, you could use a space segmentation approach (box adjustment or ligand adjustment). So you could use the maximum volume or the sum of the volumes, depending on your problem.

# Get all frames of the molecular dynamics simulation
frames = [f for f in sorted(os.listdir('./frames')) if f.endswith('.pdb')]

# Calculating volume and area
volarea = numpy.zeros((2, len(frames)))

# Check if cavities file already exists
if os.path.exists('cavities.pdb'):
    os.remove('cavities.pdb')

# Define a common custom box using parKVFinder's PyMOL plugin
box = {
    'box':{
        'p1': [ 2.03, -3.97, 7.72,],
        'p2': [ 18.93, -3.97, 7.72,],
        'p3': [ 2.03, 16.23, 7.72,],
        'p4': [ 2.03, -3.97, 33.72,],
    }
}

# Write common custom box to file
with open('box.toml', 'w') as f:
    toml.dump(box, f)

# Cavity detection with box adjustment
for frame in frames:
    # Get model number
    num = (int(frame.replace('.pdb', '')))

    # Load atomic data
    atomic = pyKVFinder.read_pdb(os.path.join('./frames', frame))

    # Get vertices from file
    vertices, atomic = pyKVFinder.get_vertices_from_file('./box.toml', atomic, probe_out=12.0)

    # Detect biomolecular cavities
    ncav, cavities = pyKVFinder.detect(atomic, vertices, probe_out=12.0, box_adjustment=True, volume_cutoff=50.0)

    # Spatial characterization
    surface, volume, area = pyKVFinder.spatial(cavities)

    # Accumulate volume and area
    volarea[0, num-1] = sum(volume.values())
    volarea[1, num-1] = sum(area.values())

    # Export cavity over time
    pyKVFinder.export('./cavities.pdb', cavities, surface, vertices, append=True, model=num)

Unfortunately, it is still not possible to see the pocket volume change with nglview. Since cavities have a different number of points to represent their volume, nglview does not handle it appropriately. However, you can easily visualize them in pymol. image

Here is an attachment with a Jupyter Notebook with a complete example of cavity detection using box adjustment: Issue#16.zip

Besides the above, we provide an example of pyKVFinder in molecular dynamics simulation at https://github.com/LBC-LNBio/pyKVFinder/blob/master/examples/md-analysis/md-analysis.ipynb and other examples at https://github.com/LBC-LNBio/pyKVFinder/blob/master/examples. Please check them too.

Hope it helps you! Let me know if you have any further questions.

mukherjeesutanu commented 1 year ago

Thanks for answering. I have another question related to this: if a cryptic pocket is not present in the reference.pdb structure, but it opens during the course of the simulation; how to track these kind of transient pockets using pyKVFinder?

Thanks and Regards, Sutanu

jvsguerra commented 1 year ago

In this scenario, you can perform two analyses:

1) Occurrence analysis

First, you could detect cavities over time and save the occurrence (number of times a voxel - grid point - appears in a cavity). With it, you can export a PDB file with all cavity points over time and the occurrence of each point as a B-factor.

# Create an empty array
occurrence = None

for frame in frames:
    # Load atomic data
    atomic = pyKVFinder.read_pdb(os.path.join('./frames', frame))

    # Get vertices from file
    vertices, atomic = pyKVFinder.get_vertices_from_file('./box.toml', atomic, probe_out=12.0)

    # Detect biomolecular cavities
    ncav, cavities = pyKVFinder.detect(atomic, vertices, probe_out=12.0, volume_cutoff=100.0, box_adjustment=True)

    if occurrence is None:
        occurrence = (cavities > 1).astype(int)
    else:
        occurrence += (cavities > 1).astype(int)

# Get cavity points
noise_cutoff = 2  # minimum number of frames that a cavity point must appear
cavities = 2 * ((occurrence >= noise_cutoff).astype('int32'))

# Export cavities with the percentage of occurrence in the B-factor column
pyKVFinder.export('./occurrence.pdb', cavities, None, vertices, B=occurrence / len(frames)) 

This analysis would help identify transient pockets across the biomolecular surface.

Examples are available at:

2) Number of cavities x Time

After identifying the transient pockets, you can perform cavity detection around regions of interest and save the number of cavities in each frame.

# Create an empty array
ncavities = []

for pdb in pdbs:
    # Load atomic data
    atomic = pyKVFinder.read_pdb(os.path.join('./data', pdb))

    # Get vertices from file
    vertices, atomic = pyKVFinder.get_vertices_from_file('./box.toml', atomic, probe_out=12.0)

    # Detect biomolecular cavities
    ncav, cavities = pyKVFinder.detect(atomic, vertices, probe_out=12.0, volume_cutoff=100.0, box_adjustment=True)

    # Append
    ncavities.append(ncav)

Then, you could plot the number of cavities over time.

print(ncavities)
# [1, 1, 1, 1, 2, 2, 2, 1, 1, 2]

ax = pandas.DataFrame(ncavities).plot.line()

# Customize axis
ax.set_ylim(0, 3)
ax.set_xlim(0, 9)
ax.set_ylabel('Number of cavities')
ax.set_xlabel('Time (ns)')
ax.grid(True)

# Save to file
fig = ax.get_figure()
fig.savefig('./ncavities.png', dpi=300)

ncavities

jvsguerra commented 1 year ago

@mukherjeesutanu, I am closing this issue for now. If you have any further questions, you can open a new issue or reopen this one.