Mohinta2892 / catena

Neuron Segmentation, Synaptic Partner Detection and Microtubule tracking for vEM with EM-2-EM translation. Codebase built upon Funke lab's algorithms.
4 stars 1 forks source link

Model convergence across Old and New GPU architectures #15

Open Mohinta2892 opened 1 month ago

Mohinta2892 commented 1 month ago

We have seen a difference in model convergence across old and new GPU architectures. For example,

With latest Nvidia GPUs like V100, A100, RTX 4090 (those that support mixed precision), we see that the same models converge faster even when trained with Single/Full precision (fp32) with batch size 1. Loss ~ 0.007 after 16 hours of training on a 40GB A100 after 120000 epochs (total epochs 300000). However, when the trained with Titan XP cards, this converge much slowly - loss ~0.05 after 120000 epochs. So, this requires more training time.

We need to investigate this further. But this is something one must be careful of as it has been seen that fast converging models may not learn the task!

Please report anything that like this till be we get a chance to look into this further.

Mohinta2892 commented 1 month ago

Using AdamW in place of Adam further cause convergence issues. For example - models can converge rapidly when AdamW is used with LRs between 1e-02 and 1e-4 with or without BatchNorm in a single GPU.