KIT-MBS / coconet

RNA Contact Prediction Using Coevolution and Convolutional Neural Network
MIT License
7 stars 3 forks source link

How do I get contact from PDB file #1

Open Ashranzzler opened 2 years ago

Ashranzzler commented 2 years ago

HI, I would like to ask how do I get groundtruth contact from the PDB file in your provided dataset. Thanks.

MehariBZ commented 2 years ago

Hi, thank you for the question.

To obtain PDB contacts from PDB files pydca can be used. For example,

from pydca.contact_visualizer.contact_visualizer import DCAVisualizer
pdb_file = 'path_to_the_pdb_file'
refseq_file = 'path_to_the_refseq_file'
chain_id = 'pdb_chain_id'

dca_vis_instance = DCAVisualizer(
    biomolecule='RNA',
    pdb_file = pdb_file,
    pdb_chain_id= chain_id,
    refseq_file = refseq_file,
)

mapped_pairs, missing_pairs = dca_vis_instance.get_mapped_pdb_contacts()

mapped_pairs is a dictionary whose keys are site pairs (index starting from zero) and values are tuples of atom pair names and distances.

Using a contact definition e.g., 10 Angstrom, and nucleotides that are at least 4 sites apart in the sequence, contacts are filtered as

contacts =  [
   site_pair for site_pair in mapped_pairs if mapped_pairs[site_pair][-1] < 10.0 and site_pair[1] - site_pair[0] > 4 
 ]

Best, Mehari