Closed cgoliver closed 1 year ago
New option for computing clusters. Distance thresholds can be a single value or a list. If it is a list, a clustering is done for each threshold.
import tempfile from proteinshake.datasets import RCSBDataset with tempfile.TemporaryDirectory() as tmp: da = RCSBDataset(root=tmp, use_precomputed=False, cluster_sequence=True, cluster_structure=True, distance_threshold_sequence=[0.3, 0.1], distance_threshold_structure=[0.3, 0.1] )
Protein dict looks like:
{'ID': '6GOX', 'sequence': 'RNDRTLRRMRKVVNIINAMEPEMEKLSDEELKGKTAEFRARLEKGEVLENLIPEAFAVVREASKRVFGMRHFDVQLLGGMVLNERCIAEMRTGEGKTLTATLPAYLNALTGKGVHVVTVNDYLAQRDAENNRPLFEFLGLTVGINLPGMPAPAKREAYAADITYGTNNEYGFDYLRDNMAFSPEERVQRKLHYALVDEVDSILIDEARTPLIISGPAEDSSEMYKRVNKIIPHLIRERGLVLIEELLVKEGGESLYSPANIMLMHHVTAALRAHALFTRDVDYIVKDGEVIWSDGLHQAVEAKEGVQIQNENQTLASITFQNYFRLYEKLAGMTGTADTEAFEFSSIYKLDTVVVPTNRPMIRKDLPDLVYMTEAEKIQAIIEDIKERTAKGQPVLVGTISIEKSELVSNELTKAGIKHNVLNAKFHANEAAIVAQAGYPAAVTIATNMAGRGTDIVLGGSWQAEVAALENPTAEQIEKIKADWQVRHDAVLEAGGLHIIGTERHESRRIDNQLRGRSGRQGDAGSSRFYLSMEDALMRIFASDRVSGMMRKLGMKPGEAIEHPWVTKAIANAQRKVESRNFDIRKQLLEYDDVANDQRRAIYSQRNELLDVSDVSETINSIREDVFKATIDAYIPPQSLEEMWDIPGLQERLKNDFDLDLPIAEWLDKEPELHEETLRERILAQSIEVYQRKEEVVGAEMMRHFEKGVMLQTLDSLWKEHLAAMDYLRQGIHLRGYAQKDPKQEYKRESFSMFAAMLESLKYEVISTLSKVQVRMP', 'structure_cluster_0.3': 7, 'structure_cluster_0.1': 7, 'sequence_cluster_0.3': 0, 'sequence_cluster_0.1': 0}
Note: removed the empty list keyword argument for exclude_ids=[], replaced default value with None.
exclude_ids=[]
None
New option for computing clusters. Distance thresholds can be a single value or a list. If it is a list, a clustering is done for each threshold.
Protein dict looks like:
Note: removed the empty list keyword argument for
exclude_ids=[]
, replaced default value withNone
.