BytedProtein / ByProt

Apache License 2.0
159 stars 16 forks source link

Question about seemingly missing codes in paper. e.g. AF2 #6

Closed multydoffer closed 1 year ago

multydoffer commented 1 year ago

It's very exciting to see the work you did, but I didn't find some codes corresponding to part of the paper. Specifically, they are:

  1. codes w.r.t. AF2, 4.2.1 in the paper
  2. Figure 7. Comparison of sequence recover, where you implement the distinguishing between core and surface part of the proteins
  3. De novo proteins
  4. antibody designs I am not sure whether I missed something or not. Could you update these parts of the code or point out the corresponding code section if it's my carelessness? Thanks a lot. @zhengzx-nlp
zhengzx-nlp commented 1 year ago

Most of the detailed questions about the paper have been resolved in the email correspondence.

For anyone who may also be curious about annotation and extraction regarding folding regions (core, surface etc.) in Fig 7, here is the script for doing this

from Bio.PDB import PDBParser
from Bio.PDB.DSSP import DSSP
import os
from tqdm import tqdm
import multiprocessing as mp
import pandas as pd
print(mp.cpu_count())

### get structural lables using DSSP
path = 'your pdb path'
result_path = 'your result path'

def DSSP_get_structural_labels(dir,chain):
    protein_f = './pepbdb/'+dir+'/receptor.pdb'
    p = PDBParser()
    structure = p.get_structure(dir, protein_f)[0]
    dssp = DSSP(structure, protein_f)
    # print(dssp.keys())
    a_key = list(dssp.keys())
    all_seq=''
    all_ss=''
    all_sol=''
    for i in range(len(list(dssp.keys()))):
        a_key = list(dssp.keys())[i]
        seq = str(dssp[a_key][1]) 
        # secondary structure label
        ss = str(dssp[a_key][2]) 
        # folding core and exposed region label, cutoff is set to 0.1, folding core <=0.1 exposed region >0.1
        sol = str(dssp[a_key][3]) 
        # print(seq,ss,sol)
        if a_key[0] == chain:
            all_seq += seq
            all_ss += ss
            all_sol += sol+','
    result = ''
    result = (dir +'\t'+ chain +'\t'+ all_seq +'\t'+ all_ss + '\t' + all_sol +'\n')
    return result