-
### Describe the bug
Hello, I implemented my own custom pipeline referring StableDiffusionPipeline (RepDiffusionPipeline), but there are some issues
I called "accelerator.prepare" properly, and mapp…
-
When I run
```
CUDA_VISIBLE_DEVICES=0 python fixmatch.py --filters=32 --dataset=cifar10.0@40-1 --train_dir ./experiments/fixmatch
```
I see the process use GPU memory in `nvidia-smi`, but there is…
-
I am now training on a VM but it only works on one CPU. When I enter GPU:0 in the config I thought it would train automatically on the GPU. But it works only on the CPU. Tensorflow GPU and CUDA are in…
-
@danielhanchen Hi, could you please give some advice for this issue? DPO training failed with Deepspeed Zero3 offload.
```
pip install "unsloth[cu121-ampere-torch211] @ git+https://github.com/uns…
-
Hello DeepSpeed :)
I am trying to use [Pipeline module](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/pipe/engine.py) to train a pipeline parallel model on multiple nodes. I …
-
## Type of feedback
## Description
On the tutorial Getting Started With WordPress: Get Familiar, Under "The 'Remember Me' Checkbox" headline there is a broken image path.
On the tutori…
-
I greatly appreciated your work, both for its simplicity of use and for your commitment. I'm probably wrong, but the library is very slow to use compared to other packages that do the same job.
I c…
-
## Description
I build LightGBM CUDA implementation, and set GPU parameter ` "device" : 'cuda' ` and ` "num_gpu" : 4`
But I got a Fatal Error: Currently cuda version only supports training on a sing…
-
First, thank you so much for sentence-transformer.
How to get embedding vector when input is tokenized already?
i guess sentence-transformer can `.encode(original text)`.
But i want …
sogm1 updated
6 months ago
-
# 🐛 Bug
There are cases in which code run on CPU will throw a `NanError` while the same code run on GPU will throw no error but produce nans, e.g. in training loss.
## To reproduce
```python…