-
## Describe the bug
Running on system with multiple GPU fails.
## To Reproduce
Setup system with 2 GPUs.
Run the training command:
```sh
python yolo/lazy.py task=train dataset=coco use_wan…
koush updated
2 weeks ago
-
## 🐛 Bug
As can be seen below Thunder is slower than torch.compile for Phi-3-mini-4k-instruct for DDP and FSDP with zero2.
![image](https://github.com/user-attachments/assets/f119fcb0-f5b3-4338…
-
**Describe the bug**
The FourierEncoder module assumes that the input data is in the format of (x, y, z, time, charge, auxiliary). However, e.g. for the IceCube86 detector, data comes in the format …
-
### Motivation
I would like to use Optuna w/ the PytorchLightningPruningCallback in a code-base with a pre-2.0 version of Pytorch Lightning. As it stands, I need to vendor the callback to support usi…
-
## 🐛 Bug
When training models: 'vicuna-7b-v1.5-16k', 'longchat-13b-16k', 'Mistral-7B-v0.2', 'falcon-180B', 'Llama-3-70B', 'CodeLlama-34b-hf' with FSDP and FP8 we get KeyError: 'scaling_fwd'. This m…
-
Ubuntu 20.04
Python: 3.10
CUDA: 12.0
GPU: 4090
Torch: 1.13 + cuda 11.7
Nvidia-driver: 525.85.12
Using: **fp16** mixed precision (fp32 is fine)
I have tried various methods:
- install spconv_…
-
## 🐛 Bug
When running Phi-3.5-mini-instruct and Qwen2.5-7B-Instruct with NeMo + ThunderFX we get error:
> 0: File "/usr/lib/python3.10/copy.py", line 153, in deepcopy
0: y = copier(memo…
-
```
[rank1]: Traceback (most recent call last):
[rank1]: File "/mnt/sdb/humannorm/launch.py", line 237, in
[rank1]: main(args, extras)
[rank1]: File "/mnt/sdb/humannorm/launch.py", line 1…
-
Hi Daniel,
I'm excited about the explanation for Pytorch for deep learning. Could you make a stand-alone course for PyTorch Lightning?
-
When I tried to evaluate the lart model, I executed your instructions and installed and downloaded all the dependent environments and data sets, but encountered an error:
FileNotFoundError: [Errno 2…