-
**Describe the feature and the current behavior/state:**
Gradient accumulation is extremely useful when working with large images/volumetric data, using low-end hardware, or training on multiple GP…
-
WARNING gradient_accumulation_steps is 3. accelerate does not support train_db.py:109
gradient_accumulation_steps when training multiple models (U-Net a…
-
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.5784903139612557e-08, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logp…
-
if batch_gpu < batch_size // num_gpus, the accumulated gradient should be normalized by (num_gpus * batch_gpu) // batch_size. The current accumulation implementation does not seem to be normalized, wh…
-
**Checklist**
1. I have searched related issues but cannot get the expected help. ✅
2. I have read the FAQ documentation but cannot get the expected help. ✅
Hi!
Let's say there is a model th…
-
-
Hi:
Thanks for your implementation. I just have a question regarding to the gradient accumulation part of NT-Xent loss. Though we divide the loss by num_accumulation_steps at each mini_batch, the f…
-
Hi, based on the following lines, it seems gradient accumulation is not properly implemented:
https://github.com/mahmoodlab/HIPT/blob/a9b5bb8d159684fc4c2c497d68950ab915caeb7e/2-Weakly-Supervised-Su…
-
你好,最近在复现本工作,有以下几个问题想请教一下
1.请问使用论文中所述的8 NVIDIA Tesla 32G-V100 GPUs总共需要多久的训练时间?
2.论文中说batchsize设置为192,iter为150K,那么train_batch_size和gradient_accumulation_steps这两个参数应该如何设置?我的理解是train_batch_size*gradient…
-
[Errno 2] No such file or directory: '../dataset/ReC/mdetr/OpenSource/finetune_refall_train.json' when i run command `accelerate launch --mixed_precision="fp16" --gpu_ids='all' --multi_gpu --main_pro…