facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

Decoding SSP labels #25

Closed chanjed closed 3 years ago

chanjed commented 3 years ago

I took a look at the SSP labels that come with the Structural Split dataset and those include (T, E, X, B, H, G, S, I, -). In the paper, it says these labels were pulled from Joosten et al 2010 where the labels correspond to (BEGHITS). What does the X character represent?

liujas000 commented 3 years ago

Hi! Thanks for asking; X is actually a padding token; it's not supervised.