I want to docking two proteins.
I just have two momomer,not Homopolymer or Heteromer.
my questions of masif-search pipeline
prepare data for ppi search:when i run data_prepare with PDBID_CHAIN,i can't get the p1_sc_labels.npyfile,but if i run it with PDBID_CHAIN1_CHAIN2,i got it.so i copy them:
transform from RegistrationResult and PointCloud to pdb format:i get the mutidock result, but i don't know how to transform it to pdb format,has some tools to transform it.
I am novice for docking and masif-search, if this pipeline has same other errors and suggestion,please point it.I don't whether i do right.
pipeline
run masif-site
Firstly,i run masif-site ./data_prepare_one.sh 2MWS_B,and then I run ./predict_site.sh 2MWS_B to predict sites,and i got four folds
And Next run ppi searchcd ../masif_ppi_search,it need format PDBID_CHAIN1_CHAIN2,but i just have two momomer,not Homopolymer or Heteromer.the command /masif/data/masif_ppi_search/data_prepare_one.sh 3AXY_Bdoesn't work.
and i down and extract pdb manually,move they into data_preparation/00-raw_pdbs/
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/00-raw_pdbs/
1A2K.pdb 5JYL.pdb
root@eb92233498e0:/masif/data/masif_ppi_search# ls data_preparation/01-benchmark_pdbs/
1A2K_A.pdb 1A2K_C.pdb 5JYL_A.pdb 5JYL_B.pdb
root@eb92233498e0:/masif/data/masif_ppi_search# ls descriptors/sc05/all_feat/1A2K_C
p1_desc_flipped.npy p1_desc_straight.npy
root@eb92233498e0:/masif/data/masif_ppi_search# ls descriptors/sc05/all_feat/1A2K_A
p1_desc_flipped.npy p1_desc_straight.npy
it doesn't raise any Exception just some warning.
for another protein,wo also do so.
cd /masif/source/masif_ppi_search/
cp -r /masif/comparison/masif_ppi_search/masif_descriptors_nn/models .
run docking by masif-search
cd /masif/source/masif_ppi_search;
touch a new python scripttouch docking.py
import scipy.sparse as spio
import copy
from Bio.PDB import *
from scipy.spatial import cKDTree
from transformation_training_data.score_nn import ScoreNN
from alignment_utils_masif_search import compute_nn_score, rand_rotation_matrix, \
get_center_and_random_rotate, get_patch_geo, multidock, test_alignments, \
subsample_patch_coords
import time
import sklearn.metrics
masif_opts = {}
masif_opts["pdb_chain_dir"] = "data_preparation/01-benchmark_pdbs/"#* 每个蛋白对应的链
masif_opts["ply_chain_dir"] = "data_preparation/01-benchmark_surfaces/"#* 蛋白对应的表面 用site计算出来的
masif_opts["ppi_search"]={}
masif_opts["ppi_search"][
"masif_precomputation_dir"
] = "data_preparation/04b-precomputation_12A/precomputation/"
masif_opts["ppi_search"]["desc_dir"] = "descriptors/sc05/all_feat/"#* 这里通过compute_description.sh计算得出
masif_opts["ppi_search"]["gif_descriptors_out"] = "gif_descriptors/"#方法gif才需要,masif不需要 空的目录文件夹
masif_opts["site"]={}
masif_opts["site"][
"masif_precomputation_dir"
] = "data_preparation/04a-precomputation_9A/precomputation/"
nn_model = ScoreNN()
import os
import numpy as np
data_dir='/masif/data/masif_ppi_search'
surf_dir = os.path.join(data_dir, masif_opts["ply_chain_dir"])
desc_dir = os.path.join(data_dir, masif_opts["ppi_search"]["desc_dir"])
pdb_dir = os.path.join(data_dir, masif_opts["pdb_chain_dir"])
precomp_dir = os.path.join(
data_dir, masif_opts["ppi_search"]["masif_precomputation_dir"]
)
precomp_dir_9A = os.path.join(
data_dir, masif_opts["site"]["masif_precomputation_dir"]
)
benchmark_list=os.path.join(data_dir, 'lists','mylist.txt')
pdb_list = open(benchmark_list).readlines()[0:100]
pdb_list = [x.rstrip() for x in pdb_list]
# Read all surfaces.
all_pc = []
all_desc = []
rand_list = np.copy(pdb_list)
#np.random.seed(0)
np.random.shuffle(rand_list)
rand_list = rand_list[0:100]
p2_descriptors_straight = []
p2_point_clouds = []
p2_patch_coords = []
p2_names = []
from geometry.open3d_import import *
for i, pdb in enumerate(rand_list):
print("Loading patch coordinates for {}".format(pdb))
pdb_id = pdb.split("_")[0]
chains = pdb.split("_")[1]
# Descriptors for global matching.
p2_descriptors_straight.append(
np.load(os.path.join(desc_dir, pdb, "p1_desc_straight.npy"))
)
p2_point_clouds.append(
read_point_cloud(
os.path.join(surf_dir, "{}.ply".format(pdb_id + "_" + chains))
)
)
pc = subsample_patch_coords(pdb, "p1", precomp_dir_9A)
p2_patch_coords.append(pc)
p2_names.append(pdb)
all_positive_scores = []
all_positive_rmsd = []
all_negative_scores = []
# Match all descriptors.
count_found = 0
all_rankings_desc = []
# Now go through each target (p1 in every case) and dock each 'decoy' binder to it.
# The target will have flipped (inverted) descriptors.
K=30
ransac_iter=100
ttf=[]
for target_ix, target_pdb in enumerate(rand_list):
target_pdb_id = target_pdb.split("_")[0]
chains = target_pdb.split("_")[1]
# Load target descriptors for global matching.
target_desc = np.load(os.path.join(desc_dir, target_pdb, "p1_desc_flipped.npy"))
# Load target point cloud
target_pc = os.path.join(surf_dir, "{}.ply".format(target_pdb_id + "_" + chains))
target_pcd = read_point_cloud(target_pc)
# Read the point with the highest shape compl.
sc_labels = np.load(os.path.join(precomp_dir, target_pdb, "p1_sc_labels.npy"))
center_point = np.argmax(np.median(np.nan_to_num(sc_labels[0]), axis=1))
# Go through each source descriptor, find the top descriptors, store id+pdb
num_negs = 0
all_desc_dists = []
all_pdb_id = []
all_vix = []
gt_dists = []
# This is where the desriptors are actually compared (stage 1 of the MaSIF-search protocol)
for source_ix, source_pdb in enumerate(rand_list):
source_desc = p2_descriptors_straight[source_ix]
desc_dists = np.linalg.norm(source_desc - target_desc[center_point], axis=1)
all_desc_dists.append(desc_dists)
all_pdb_id.append([source_pdb] * len(desc_dists))
all_vix.append(np.arange(len(desc_dists)))
if source_pdb == target_pdb:
source_pcd = p2_point_clouds[source_ix]
eucl_dists = np.linalg.norm(
np.asarray(source_pcd.points)
- np.asarray(target_pcd.points)[center_point, :],
axis=1,
)
eucl_closest = np.argsort(eucl_dists)
gt_dists = desc_dists[eucl_closest[0:50]]
gt_count = len(source_desc)
all_desc_dists = np.concatenate(all_desc_dists, axis=0)
all_pdb_id = np.concatenate(all_pdb_id, axis=0)
all_vix = np.concatenate(all_vix, axis=0)
ranking = np.argsort(all_desc_dists)
# Load target geodesic distances.
target_coord = subsample_patch_coords(target_pdb, "p1", precomp_dir_9A, [center_point])
# Get the geodesic patch and descriptor patch for the target.
target_patch, target_patch_descs = get_patch_geo(
target_pcd, target_coord, center_point, target_desc, flip=True
)
# Make a ckdtree with the target.
target_ckdtree = cKDTree(target_patch.points)
## Load the structures of the target and the source (to get the ground truth).
parser = PDBParser()
target_struct = parser.get_structure(
"{}_{}".format(target_pdb_id, chains[0]),
os.path.join(pdb_dir, "{}_{}.pdb".format(target_pdb_id, chains)),
)
#gt_source_struct = parser.get_structure(
# "{}_{}".format(target_pdb_id, chains[1]),
# os.path.join(pdb_dir, "{}_{}.pdb".format(target_pdb_id, chains[1])),
#)
# Get coordinates of atoms for the ground truth and target.
target_atom_coords = [atom.get_coord() for atom in target_struct.get_atoms()]
target_ca_coords = [
atom.get_coord() for atom in target_struct.get_atoms() if atom.get_id() == "CA"
]
target_atom_coord_pcd = PointCloud()
target_ca_coord_pcd = PointCloud()
target_atom_coord_pcd.points = Vector3dVector(np.array(target_atom_coords))
target_ca_coord_pcd.points = Vector3dVector(np.array(target_ca_coords))
target_atom_pcd_tree = KDTreeFlann(target_atom_coord_pcd)
target_ca_pcd_tree = KDTreeFlann(target_ca_coord_pcd)
found = False
myrank_desc = float("inf")
chosen_top = ranking[0:K]
pos_scores = []
pos_rmsd = []
neg_scores = []
# This is where the matched descriptors are actually aligned.
for source_ix, source_pdb in enumerate(rand_list):
viii = chosen_top[np.where(all_pdb_id[chosen_top] == source_pdb)[0]]
source_vix = all_vix[viii]
if len(source_vix) == 0:
continue
source_desc = p2_descriptors_straight[source_ix]
source_pcd = copy.deepcopy(p2_point_clouds[source_ix])
source_coords = p2_patch_coords[source_ix]
# Randomly rotate and translate.
random_transformation = get_center_and_random_rotate(source_pcd)
source_pcd.transform(random_transformation)
# Dock and score each matched patch.
#print({'source_pcd':source_pcd,'source_coords':source_coords,'source_desc':source_desc,'source_vix':source_vix\
#,'target_patch':target_patch,'target_patch_descs':target_patch_descs,'target_ckdtree':target_ckdtree,'ransac_iter':ransac_iter})
if source_pdb!=target_pdb:#same structure does not need docking
all_results, all_source_patch, all_source_scores = multidock(
source_pcd,
source_coords,
source_desc,
source_vix,
target_patch,
target_patch_descs,
target_ckdtree,
nn_model,
ransac_iter=ransac_iter
)
res={'target_pdb':target_pdb,'source_pdb':source_pdb,'all_results':all_results,\
'all_source_patch':all_source_patch,'all_source_scores':all_source_scores}
ttf.append(res)
#ttf 返回的值代表好几种对接方式,通过指纹搜索来确定种类
see ttf[0]
>>> ttf[0]
{'target_pdb': '1A2K_C', 'source_pdb': '5JYL_B', 'all_results': [registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result., registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result., registration::RegistrationResult with fitness=2.600000e-01, inlier_rmse=6.074436e-01, and correspondence_set size of 26
Access transformation to get result., registration::RegistrationResult with fitness=0.000000e+00, inlier_rmse=0.000000e+00, and correspondence_set size of 0
Access transformation to get result.], 'all_source_patch': [geometry::PointCloud with 100 points., geometry::PointCloud with 100 points., geometry::PointCloud with 100 points., geometry::PointCloud with 100 points.], 'all_source_scores': [0.0014200918, 0.001413941, 2.9317687e-05, 0.001454716]}
all_results[0]is <open3d.open3d.registration.RegistrationResult>object,fitnessmeans correspondence point,if it eq 0,
the structure is bad.
but how can i transform from RegistrationResult and PointCloud to pdb format.I want to get the dockered structure.
I am Novice for docking,please help me.
Thank you.
I want to docking two proteins. I just have two momomer,not Homopolymer or Heteromer.
my questions of masif-search pipeline
PDBID_CHAIN
,i can't get thep1_sc_labels.npy
file,but if i run it withPDBID_CHAIN1_CHAIN2
,i got it.so i copy them:but i don't know whether it is right,it is right?
mutidock
result, but i don't know how to transform it to pdb format,has some tools to transform it.pipeline
run masif-site
Firstly,i run masif-site
./data_prepare_one.sh 2MWS_B
,and then I run./predict_site.sh 2MWS_B
to predict sites,and i got four foldsit run very well.
prepare data for ppi search
And Next run ppi search![image](https://user-images.githubusercontent.com/40107519/123605705-b44d6b80-d82e-11eb-99fc-90e04ed21445.png)
cd ../masif_ppi_search
,it need formatPDBID_CHAIN1_CHAIN2
,but i just have two momomer,not Homopolymer or Heteromer.the command/masif/data/masif_ppi_search/data_prepare_one.sh 3AXY_B
doesn't work.and i down and extract pdb manually,move they into
data_preparation/00-raw_pdbs/
next i run:
get the output
and then calculate the description
and get the output
it doesn't raise any Exception just some warning. for another protein,wo also do so.
i see the py file
/masif/source/masif_ppi_search/second_stage_alignment_nn.py
it need some necessary setting variables.it is global variable. define my protein list:
lack file p1_sc_labels.npy,
i move file but i don't know whether it is right
move model to workplace
run docking by masif-search
cd /masif/source/masif_ppi_search;
touch a new python scripttouch docking.py
see ttf[0]
all_results[0]
is<open3d.open3d.registration.RegistrationResult>
object,fitness
means correspondence point,if it eq 0, the structure is bad. but how can i transform from RegistrationResult and PointCloud to pdb format.I want to get the dockered structure. I am Novice for docking,please help me. Thank you.