-
### Describe the issue
I am re-training some onnx models from [ONNX Model Zoo Repo](https://github.com/onnx/models/tree/main/vision/classification/resnet), especially Resnet50. Previously, I crea…
-
### Discussed in https://github.com/microsoft/onnxruntime/discussions/19390
Originally posted by **Marouan-st** February 2, 2024
Hello,
I would like to implement a custom loss to be able t…
-
I need help training Flux Lora on multiple GPUs. The memory on a single GPU is not sufficient, so I want to train on multiple GPUs. However, configuring device: cuda:0,1 in the config file doesn't see…
-
### Describe the issue
Hi everyone, I'm trying to force some parameter values (convolution layers weighs) during the re-training process using OnDevice Training features -> [onnx-runtime-training-exa…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
Trai…
-
Hello, Thank you for your earlier answer,My GPU device is RTX3090, I'm not sure if your code can easily run training on A100 nodes with the skypilot?
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
…
-
How can I run de LoRA trainer on my second Cuda device ?
It seems to be working nice on the first cuda device :
```
...
08/22/2024 17:04:37 - INFO - __main__ - Distributed environment: DEEPSPEE…
-
### What you would like to be added?
## Description
We are proposing changes to enhance training job restart, that can help avoid restart failures and delays in case of GPU instance/k8s node fai…
-
### Describe the issue
I am re-training some onnx models from [ONNX Model Zoo Repo](https://github.com/onnx/models/tree/main/vision/classification/resnet), especially quantised Resnet50 with INT8 dat…