BorgwardtLab / proteinshake

Protein structure datasets for machine learning.
https://proteinshake.ai
BSD 3-Clause "New" or "Revised" License
101 stars 9 forks source link

add TMalign on all pairs in dataset class #127

Closed cgoliver closed 1 year ago

cgoliver commented 1 year ago

New method to base dataset class that is activated by the constructor flag all_pairs_distance.

Example usage:

from proteinshake.datasets import RCSBDataset

da = RCSBDataset(root="tmtest",
                 use_precomputed=False,
                 all_pairs_distance=True
                 )

Dumps a dictionary that stores the TMscore and RMSD for all pairs in the dataset to the path {self.root}/{self.__class__.__name__}_tmalign.json

The dictionary is keyed by PDBIDs. For each key, we store a tuple which is (tm_score, RMSD).