-
Hello,
I'm having compile failures with `compile_pytorch_model.py`. Heres my failure:
```bash
/drp-ai_tvm/tutorials# python3 compile_pytorch_model.py /home/models/spark_torch.pt -o spark_torch -s…
-
Hi and thanks for the great resources.
I used "train-deploy-llama3.ipynb" and trained a similar Llama3 model as shown in the notebook.
I pushed my model on hugging face and now I want to use that …
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussi…
-
Hi, I face an issue during training, fine-tuning, and evaluation. The error is `AttributeError: module 'Polygon' has no attribute 'Polygon'`. I already install Polygon via `pip install Polygon`. Any s…
-
We have a lot of Sphinx Doc Warnings, some of which we don't know where they come from. We should aim towards zero warnings.
Some examples:
```
/home/robintibor/work/braindecode-dev/braindecode/b…
-
Recently downloaded kohya_ss and did everything according to the instructions then prepared the dataset and started training lora. In the process I always get this error:
```
Traceback (most recen…
-
Hi I have a script that runs with the DataParralell trainer on a machine with 8 H100 GPUs (aws p5 VM) with deepspeed. When we run the script it starts to randomly get stuck forever at some iteration r…
-
we need to make sure our Training Module names match in all occurrences. (i.e. 'Reproducibility Basics' versus 'Computational Basics').
-
**Describe the bug**
I launch deepspeed training for a 600M parameter diffusion model, and only vary `reduce_bucket_size`.
I tried the following values:
- `reduce_bucket_size: 500_000_000` — conve…
-
Hi,
I have encountered an issue where the dataset I entered is too large to be read, and if it is particularly large, , it can cause the process to be Killed. For example,
Loading extension modul…