facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

dataset of contact prediction #167

Closed JianquanZhao closed 2 years ago

JianquanZhao commented 2 years ago

I am curious about the '20 proteins' for the training of contact predictor. In your paper Rao R, Meier J, Sercu T, et al. Transformer protein language models are unsupervised structure learners[C]//International Conference on Learning Representations. 2020. There is a sentence 'This leaves us with 14882 total sequences. We reserve 20 sequences for training, 20 sequences for validation, and 14842 sequences for testing.' But I did't see any details about these sequences, can u give any information about these sequences or the code of training cantact predictor ?

JianquanZhao commented 2 years ago

https://yanglab.nankai.edu.cn/trRosetta/benchmark/