-
### 🐛 Describe the bug
I'm getting profiles like this:
The first `all_to_all_single` / `barrier` are taking large amounts of time (and invoking thousands of `cudaMemcpyAsync`, `cudaGetDevice…
-
I can run the original `tutorial_train.py` with single 3090Ti GPU (24G) with batch_size 3.
However, when upgrade to 2 or more gpus, it keep warning OOM.
```
trainer = pl.Trainer(gpus=2 precision=…
-
When I set:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3'
will raise error :
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [42,0,0], thread: [64,0,0] Ass…
-
https://github.com/microsoft/DeepSpeed
Deepspeed is a state of the art library that enable out of the box many optimizations for inference.
While especially good for clusters, I believe it can bring…
-
### Is there an existing issue for this problem?
- [X] I have searched the existing issues
### Operating system
Windows
### GPU vendor
Nvidia (CUDA)
### GPU model
RTX 4090
### GPU VRAM
24
##…
-
## 🐛 Bug
Cant load wav2vec checkpoint as described here https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
### To Reproduce
Run this colab
https://colab.research.googl…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
**System information**
- Alpa version: 0.2.2
- Are you willing to contribute it (Yes/No):
No.
**Describe the new feature and the current behavior/state**
Alpa supports OPT-175B currently. But t…
-
![image](https://user-images.githubusercontent.com/20476674/212067731-50506295-9e27-41f3-ab25-558ade9e5fbb.png)
It seems that the 32G GPU is not enough. How large memory a GPU is needed for normal op…
-
hi authors,
I am curious about the performance of the model on waymo dataset, but this was not mentioned in the paper. May I ask if you have conducted any relevant experiments and what were the res…