facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

training code for unsupervised contact prediction #249

Closed zhenyuhe00 closed 2 years ago

zhenyuhe00 commented 2 years ago

Hi guys,

Congrats on the excellent work and great results. May I ask do you plan to release the code for training unsupervised contact prediction?

Thanks in advance.

rmrao commented 2 years ago

This is trained using scikit-learn with the following parameters

X = [N x (num_layers * num_attn_heads)] - 2D feature array, each entry is the attentions for one contact (i, j) for one protein
y = [N x 1] - 2D boolean array, whether or not each entry corresponds to a contact
clf = sklearn.linear_model.LogisticRegression(
    penalty="l1",
    C=0.15,
    solver="liblinear",
)
clf.fit(X, y)
zhenyuhe00 commented 2 years ago

Thanks a lot!

zhenyuhe00 commented 2 years ago

I wonder what is the parameter for "max_iter", is it set to the default number 100?

zhenyuhe00 commented 2 years ago

default iteration is probably enough