Hello,
Thank you very much for maintaining this repository. I was wondering if you could provide the dataset used in the ESM1-b paper to evaluate Remote Homology prediction. Some details are given in the paper/supplement, but I'd like to be able to reproduce the results if possible. Just using the 40%-threshold SCOPe subset and removing the specified folds is leaving me with a different number of sequences than stated in the paper, and I am seeing significantly lower AUC and HIT-10 than reported in the paper. Were there other filtering steps that might explain the discrepancy?
Hello, Thank you very much for maintaining this repository. I was wondering if you could provide the dataset used in the ESM1-b paper to evaluate Remote Homology prediction. Some details are given in the paper/supplement, but I'd like to be able to reproduce the results if possible. Just using the 40%-threshold SCOPe subset and removing the specified folds is leaving me with a different number of sequences than stated in the paper, and I am seeing significantly lower AUC and HIT-10 than reported in the paper. Were there other filtering steps that might explain the discrepancy?
Best, Asher Moldwin