-
The code
```
double f(double const* const x, std::size_t const n)
{
double acc{};
for (std::size_t i = 0; i < n; ++i)
acc += x[i] * x[i];
return acc;
}
int main()
{
auto g = clad:…
-
**Describe the feature and the current behavior/state:**
Gradient accumulation is extremely useful when working with large images/volumetric data, using low-end hardware, or training on multiple GP…
-
Hello,
I have an issue with multiple GPU performance.
- I use the recipe `lora_finetune_single_device` with the config `mini_lora_single_device.yaml` on 6000ADA, I got ~5it/s
- I use the recipe `lo…
-
### Describe the bug
when I run the script train_dreambooth_lora_flux.py. It raise ValueError: unexpected save model: . something bug in save_model_hook?
![Uploading image.png…]()
### Reproducti…
-
Evaluations are being run, _but no validation loss is logged or sent to WandB_
The console shows that eval is running, but displays a table along the lines of:
| eval loss | validation loss |
|…
-
I'm bringing my own PyTorch training script, and I'm interested in using SM Debugger to profile function calls in my training jobs. The [API Glossary](https://github.com/awslabs/sagemaker-debugger/blo…
-
1. `gradient_accumulation_steps` configuration is not documented at all - it's only mentioned in the context of pipeline
2. there are no instructions on how to integrate it with the existing trainer …
-
# accelerate_config with num_processes == 3
> compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
gradient_accumulation_steps: 2
gradient_clipping: 1.0
offload_optimizer_devi…
-
Thank you very much for the work you have brought, which is very helpful for those of us with fewer training resources. I am a newcomer to the field of NLP and am not very familiar with training frame…
-
Hi, based on the following lines, it seems gradient accumulation is not properly implemented:
https://github.com/mahmoodlab/HIPT/blob/a9b5bb8d159684fc4c2c497d68950ab915caeb7e/2-Weakly-Supervised-Su…