"unsupervised" contact prediction

Thank you for your interest in our work! We use the term unsupervised because the contacts are learned directly via the unsupervised language modeling objective. The point we are making is that the logistic regression is not learning the contacts, the language modeling is.

Note that the attention heads predict contacts directly without using the regression weights. Results for averaging the top 1, 5, and 10 heads are shown in Table 2. Simply averaging the top 5 heads already performs better than training a Potts model using the same sequence database used for training ESM (in Table 2 compare Gremlin on ESM data to the lines for top-5 and top-10 heads). When logistic regression weights are used, they are fit with just 20 proteins. This improves performance further over averaging the heads. Figure 12 shows bootstrap results indicating that any randomly selected 20 proteins produce similar results. In the Low-N supervision section we show that the regression can be fit with even a single example.

All of this is evidence that the contacts are learned by the unsupervised pre-training -- which makes Potts models the natural comparison.

facebookresearch / esm

"unsupervised" contact prediction #96