Open G-Armstrong opened 1 year ago
The data within the directed_graph
looks like this for each workIndex. Notice the DONOR_HYDRO term for each donor entry. This term describes the hydrogen atom participating in the hydrogen bond. DONOR, on the other hand, is the electronegative antecedent atom that the DONOR_HYDRO atom is covalently bound to.
The
graph_data_set.ipynb
notebook includes code that can identify the ontological classification of all hydrogen bonds present at the interface between two interacting protein halves in the SKEMPIv2 data set. The notebook has a function calledpretty_print()
that accepts a dictionary calledhashmap
as input. The keys inhashmap
represent acceptor and donor cones of protein half 1, while the values for each key represent those acceptors and donor cones in protein half 2 that intersect and face the cones of half 1.Hydrogen bonds can only form between acceptor and donor pairs that are both oriented towards one another and fall within the 4.6A hydrogen bond cutoff (i.e. 2 * 2.3A = 4.6A). Luckily, the
hashmap
input already contains the acceptor/donor pairs that meet these criteria, but it also contains acceptor-acceptor and donor-donor entries and k,v pairs that could not possibly interact. Therefore, thepretty_print( )
function filters downhashmap
for only acceptor-donor and donor-acceptor pairs and appends them to a new dictionary calleddirected_graph
after printing them out neatly to the console.The task is to understand how this code works, and build a json file that can be queried for the network of hydrogen bonds present at the interface of any given wild type (WT) or mutant type (MT) pdb. The json file should take a nested dictionary format: