-
@awaelchli I found that in the `pretrain.py`, the accumulation steps are calculated based on global batch size, device number and micro batch size.
This works fine under single-node setting, e.g. glo…
-
Hi! While training on multi GPU and using gradient accumulation steps > 1 there's no substantial speedup with relation to a single GPU (there is a speedup if the value is equal to 1). I found followin…
dprze updated
2 months ago
-
Since Ada GPUs like 4090 limit the FP8 arithmetic into `fp32` accumulation, it only achieve the same max `TFLOPs` compared to `fp16xfp16` with `fp16` accumulation.
Further more, according to my test,…
-
Using our launcher and the latest pull of our pretrain repo you can run a Llama3 70B model as follows. Thanks to @AleHD for getting activation recompute and async working.
```
(export DP=1 PP=4 BA…
-
Evaluations are being run, _but no validation loss is logged or sent to WandB_
The console shows that eval is running, but displays a table along the lines of:
| eval loss | validation loss |
|…
-
Hello, I am trying to persist the IoT data that I have by applying Cygnus LD and connect it to postgres.
I ran the following command for the subscription:
```
curl -L -X POST 'http://localhost:…
-
```
What steps will reproduce the problem?
1. run the 6dofhead
2. produce abrupt head motion
3. head angles will accumulate errors
What version of the product are you using? 0.7
On what operating sy…
-
```
What steps will reproduce the problem?
1. run the 6dofhead
2. produce abrupt head motion
3. head angles will accumulate errors
What version of the product are you using? 0.7
On what operating sy…
-
```
What steps will reproduce the problem?
1. run the 6dofhead
2. produce abrupt head motion
3. head angles will accumulate errors
What version of the product are you using? 0.7
On what operating sy…
-
```
What steps will reproduce the problem?
1. run the 6dofhead
2. produce abrupt head motion
3. head angles will accumulate errors
What version of the product are you using? 0.7
On what operating sy…