facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

Odd Prec? #28

Closed 227BaronChen closed 3 years ago

227BaronChen commented 3 years ago

I have a small problem. In my cognition, the prediction of long-range residue contact is more difficult than the short-range one, but the indicators given in this paper are just the opposite. If anyone can explain it, I would be very grateful.

nasserhashemi commented 3 years ago

I have checked the contact map over 25000 target based on single sequence using esm-b model; and the mean precision of long-range was lower than short-range as expected;

227BaronChen commented 3 years ago

Okay, but why Table 1 and Table 2 in the paper are different? Did I make a mistake?

nasserhashemi commented 3 years ago

I am not sure; I think, the termination of short-range has some kind of inconsistency; I think when people talked about short-range they consider any i and j which |i - j| > 5; however; in the paper they consider 5<|i-j|<12; I think it is the reason; for example see the table 1 and 2 in the below paper: https://academic.oup.com/bioinformatics/article/28/2/184/198108

but again, I am not sure

rmrao commented 3 years ago

Yes - in the paper we specifically consider short range contacts to be those in the range 5 < |i - j| < 12, rather than |i - j| > 5. Note this is consistent with CASP/etc. which refer to |i - j| > 5 as SR + MR + LR contacts, rather than specifically short-range contacts. This value can be quite low because there are often fewer than L short or medium range contacts, so it may be impossible to reach a precision of 1.

nasserhashemi commented 3 years ago

yes, you are right, so the paper I mentioned was inconsistent:)

tomsercu commented 3 years ago

seems like this is resolved - but feel free to open a discussion here!