facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

Dataset for Remote Homology Evaluation #620

Open amoldwin opened 1 year ago

amoldwin commented 1 year ago

Hello, Thank you very much for maintaining this repository. I was wondering if you could provide the dataset used in the ESM1-b paper to evaluate Remote Homology prediction. Some details are given in the paper/supplement, but I'd like to be able to reproduce the results if possible. Just using the 40%-threshold SCOPe subset and removing the specified folds is leaving me with a different number of sequences than stated in the paper, and I am seeing significantly lower AUC and HIT-10 than reported in the paper. Were there other filtering steps that might explain the discrepancy?

Best, Asher Moldwin