-
Hi @anas-awadalla
As described in #124, "Our training took place on 32 80GB A100s. We trained on 5M samples from MMC4 and 10M from LAION 2B."
I am interested in the details of loss during trai…
-
### 🐛 bug 说明
**使用指令**
CUDA_VISIBLE_DEVICES=2,3 accelerate launch --num_processes 2 path_to_train_m3e.py path_to_model path_to_dataset \
--output-dir output_dir
**报错信息**
…
-
For uneven sharding case, the local tensor on rank that does not have the full shard can be wrong, using the full shard's stride instead of accounting for the local tensor missing some elements from t…
awgu updated
8 months ago
-
I am using dlrover on Megatron-DeepSpeed,and my machine has 4 GPUs. The hybrid parallel settings are as follows,
TP:[0,1],[2,3]
DP:[0,2],[1,3]
At the same time, I also configured DeepSpeed with Zer…
-
### 🚀 The feature, motivation and pitch
# RFC: PyTorch DistributedTensor
We have been developing a DistributedTensor (a.k.a DTensor) concept under the [pytorch/tau](https://github.com/pytorch/ta…
-
# Overview
Race condition? leading to a crash when multiple GPUs (processes) are used and the output directory doesn't exist.
## Steps to reproduce
Run a multiple GPU job with `torchrun` and …
-
Hi, thanks for this amazing repo.
I was wondering how should I set batch size to make a desirable full batch size.
For example, if I set train_dataset.huggingface_dataset.batch_size to 1 on TPUv3-…
-
Looks like JAX used to do some "broadcasting" here, and no longer does. Bearing in mind that a PyTree may have arrays of multiple ranks, I'm not immediately sure what the appropriate fix is.
Taggin…
-
### 🐛 Describe the bug
When sharding a model using the `fully_shard` API any custom parameter attributes set in the unsharded model are not copied over to the sharded model. This can be observed he…
-
```
Traceback (most recent call last):
File "E:\Stable Diffusion\stable-diffusion-webui-amdgpu-forge\launch.py", line 51, in
main()
File "E:\Stable Diffusion\stable-diffusion-webui-amdgpu…