Low performance on training from scratch on a single GPU

DeepGraphLearning / GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)

MIT License

263 stars 27 forks source link

Low performance on training from scratch on a single GPU #4

Closed hxu105 closed 1 year ago

hxu105 commented 1 year ago

Hi, I am trying to reproduce the experiments, but the reproduced results have large gaps between the paper results. Reproduced: GearNet: EC: 0.514 (200 epochs) GO-BP: 0.176 (146 epochs) GO-CC: 0.145 (84 epochs) GearNet-Edge: EC: 0.404 (163 epochs) GO-BP: 0.255 (100 epochs) GO-CC: 0.163 (107 epochs)

I use the same configuration and hyperparameter as provided in the rep. Training runs on one single GPU, and the some of the experiments are still under training.

Many thanks

Oxer11 commented 1 year ago

Hi! Thanks for raising this issue!

By default, we use 4 A100 GPU for pretraining and finetuning our model. The batch size per GPU on the downstream tasks is set to 2, which is equivalent to batch_size=24=8 on a single GPU. This hyperparameter may have large influence on your final results.

I suggest to follow the default setting to use 4 GPUs. If so, the F1 max should get 0.8 on EC in 20 epochs. If you want to run the model on a single GPU, maybe you can set the batch size as 8 and lower the hidden dimension. Though I haven't tried this setup, I believe that it can still get good performance.

hxu105 commented 1 year ago

Thanks for the answering, will try to reproduce the experiments again with more GPUs.

Oxer11 commented 1 year ago

Hi! I recently find that the scheduler is important for the performance. I've added it back into the codebase in 437333f and updated a config file for single gpu on EC, which can reproduce the results in the paper.

Oxer11 commented 1 year ago

I re-run the code after fixing the scheduler issue and attach the log files here on EC with single gpu for your reference. gearnet_edge_ec_1gpu.txt