Closed zhenyuhe00 closed 2 years ago
How are you sampling those 500k sequences? Are they chosen from UR50 clusters or just arbitrarily chosen? If you've trained on a much smaller set of less diverse sequences this could be the reason for your worse performance.
Hi, Congrats again on your great work!
I use the pre-trained Bert base checkpoint esm1_t12_85M_UR50S pretrained on over 20 million sequences you guys released and test its unsupervised contact prediction performance. The Long_P@L of it is about 0.20~0.30.
However, I also pre-trained a Bert-base of 85M parameters but the Long_P@L of it on the same test dataset is less than 0.05. The difference between my Bert base and your esm1_t12_85M_UR50S is that my Bert base is post norm, crop size 384, pretrained on 0.5 million sequences (esm1_t12_85M_UR50S is pre norm, crop size 1024, pretrained on over 20million sequences).
I wonder why my Bert base is much worse than your Bert base, is it because of the different amount of pretraining sequences?
Thanks in advance!