dataset of contact prediction

I am curious about the '20 proteins' for the training of contact predictor. In your paper Rao R, Meier J, Sercu T, et al. Transformer protein language models are unsupervised structure learners[C]//International Conference on Learning Representations. 2020. There is a sentence 'This leaves us with 14882 total sequences. We reserve 20 sequences for training, 20 sequences for validation, and 14842 sequences for testing.' But I did't see any details about these sequences, can u give any information about these sequences or the code of training cantact predictor ?

facebookresearch / esm