drorlab / atom3d

ATOM3D: tasks on molecules in three dimensions
https://www.atom3d.ai
MIT License
300 stars 35 forks source link

Why models perform worse on the 60% identity split LBA dataset than on the 30% split? #57

Open Gloria-LIU opened 2 years ago

Gloria-LIU commented 2 years ago

Hi, Thank you for the amazing work! I am curious about the results in table 8. Why most models (other than GNN) perform dramatically worse in the 60% identity split than in the 30% identity split? Intuitively, the task with 60% split should be easier and achieve better performance as there is more similarity between protein sequences.

smiles724 commented 2 years ago

Hi, Thank you for the amazing work! I am curious about the results in table 8. Why most models (other than GNN) perform dramatically worse in the 60% identity split than in the 30% identity split? Intuitively, the task with 60% split should be easier and achieve better performance as there is more similarity between protein sequences.

I agree with your theoretical guess. However, not only atom3d but also some other following studies show the similiar phenomenon. For instance, the table below is copied from Multi-Scale Representation Learning on Proteins image