-
### 🐛 Describe the bug
When I try to train model using torch.distributed.FullyShardedDataParallel, I found that :
when training using single-node multi-gpu (1x8A100), the training speed is normal.…
-
### Question
1, In my understanding, the first pretraining stage uses either the CC-3M Concept-balanced 595K dataset or the LAION/CC/SBU BLIP-Caption Concept-balanced 558K dataset. The second stage u…
-
### When did you clone our code?
I cloned the code base after 5/1/23
### Describe the issue
I manually download the pre-trained model at my path, here, which click the download button for each.
![…
-
## Idea 💡
The **ULTIMATE** achievement for this project would be if Auto-GPT was able to recursively improve itself. That, after-all, is how AGI is predicted by many to come about.
## Suggestion …
-
I can load checkpoint correctly if I run train_ds.py, but when I use deepspeed as the given example, this error occurs. Can you tell me how to fix it?
You are using the legacy behaviour of the . This…
-
### When did you clone our code?
I cloned the code base after 5/1/23
### Describe the issue
Issue: scripts/deepspeed/finetune_lora.sh
I think in training workflow `--model_name_or_path` should not…
-
![image](https://user-images.githubusercontent.com/22076188/233404474-9d0977c7-c374-4aae-b673-06fabccb0466.png)
-
### When did you clone our code?
I cloned the code base before 5/1/23, but have pulled the latest code base
### Describe the issue
Issue:
After download done, the script of apply_delta is killed
…
-
Thanks for the awesome repo and the exciting progress on multimodal learning. Looking forward to trying out the model and building off of it, but having some issues getting started with fine-tuning my…
-
### When did you clone our code?
I cloned the code base after 5/1/23
### Describe the issue
Issue:
Command:
```
torchrun --nnodes=1 --nproc_per_node=4 --master_port=25001 \
llava/train/tr…