DeepGraphLearning / GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
MIT License
253 stars 28 forks source link

Reproduce the result for ProteinBERT #12

Closed hxu105 closed 1 year ago

hxu105 commented 1 year ago

Hi, thank you for sharing the work and answering questions. I recently want to reproduce the proteinBERT results as shown in the your paper. However, the performance with directly using the given config file is only about 0.079.

image

The loss is pretty low on training, validation, and testing, but it seems the model isn't able to classify data correctly. The downstream task is EC. Do you have any suggestions fixing this issue?

And I also try to use HuggingFace protBERT to rerun the experiments. The result is also around 0.078 and has low loss values. Would you be willing to give any advice on this as well?

Many thanks for your answering!

Oxer11 commented 1 year ago

Hi! The provided config files is used for unpretrained ProtBERT (with a similar but much smaller architecture). If you'd like to run the pretained ProtBERT, you can refer to the PEER_Benchmark repo to use the API and checkpoints from HuggingFace.

Also, I suggest to try ESM, which yields better performance and has been integrated in the TorchProtein platform.