-
I am trying to run YoloV8 on my AMD RX550 using DirectML and every time i run the training it fails with this error
RuntimeError: Cannot set version_counter for inference tensor
`from ultralyti…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
I ran into an issue trying to train flan-t5 on an M1 using torchmetrics. Training metrics worked fine, but I got the following stacktrace when calculating evaluation metrics:
```commandline
...
…
-
### System Info
LLM, fine tuning, neural networks, hyperparameters, batch size, learning rate, Huggingface
### Who can help?
@BenjaminBossan @sayakpaul @stevh
### Information
- [ ] The officia…
-
Hi, thanks for your implementation, but not sure why I got stuck on the line 87, train.py: loss, preds = model.fit(x, y).
The tqdm seems not change:
![image](https://github.com/user-attachment…
-
Hi,after that (https://github.com/mobiusml/hqq/issues/107) I want to reproduce your paper about HQQ+ but meet some problems when training:
1.Loss doesn't decrease but varies up and down…
-
I have a couple of questions.
1. on the paper you say you used 1920 batchsize when experimenting with stage 2, can you tell me how many gpu nodes you actually used and how many gpu per node?
2. …
-
### System Info
```
accelerate 0.33.0
peft 0.12.0
Python 3.12.5
macOS: 15.0
MacBook pro M1 Pro 16 gb
```
### Who can help?
### Information
- [ ] The officia…
-
# Single-Device-Abstract DDP
## Motivation
In current PyTorch DDP, when training a model with Dropout operations, the final results obtained from distributed training will not be consistent with t…
-
- [x] System boot up, eth training, reset
- [x] Try out CPP unit tests on all chips
- [ ] Tweak multi device APIs if needed
- [ ] Running models