Open Gloria-LIU opened 2 years ago
Hi, Thank you for the amazing work! I am curious about the results in table 8. Why most models (other than GNN) perform dramatically worse in the 60% identity split than in the 30% identity split? Intuitively, the task with 60% split should be easier and achieve better performance as there is more similarity between protein sequences.
I agree with your theoretical guess. However, not only atom3d but also some other following studies show the similiar phenomenon. For instance, the table below is copied from Multi-Scale Representation Learning on Proteins
Hi, Thank you for the amazing work! I am curious about the results in table 8. Why most models (other than GNN) perform dramatically worse in the 60% identity split than in the 30% identity split? Intuitively, the task with 60% split should be easier and achieve better performance as there is more similarity between protein sequences.