-
## 🚀 Feature
[Documentation says](https://lightning.ai/docs/pytorch/latest/advanced/compile.html#limitations) that torch compile is not supported over distributed training right now. Since torch co…
-
Hello, thank you very much for your contribution. I encountered the following error when I was debugging. When I run the bash tools/prepare_places_val.sh command I get No module named 'training'
![20…
-
We recognize that in missing parameter multi-turn dataset, if there is a missing parameter round, the ground truth label is an empty list (or undecodable), which is the same as an irrelevance signal. …
-
## Description
The meta dtypes of bmm are different for inference and training. The dtype is implicit for inference, while it is a wrong explicit dtype for training.
## Reproduction
```
import t…
-
### System Info
I am using trl version 0.11.3 by pip installation. I want to reproduce the example code on the huggingface tutorial (https://huggingface.co/docs/trl/main/en/dpo_trainer), but I encoun…
-
Hello, the checkpoint of internvl2-8b obtained after training with Swift 2.25 and later versions defaults to the regular lora target module, and the current regular lora target module cannot use imdep…
-
### System Info / 系統信息
Traceback (most recent call last):
File "/home/sa/swift/swift/cli/sft.py", line 5, in
sft_main()
File "/home/sa/swift/swift/utils/run_utils.py", line 32, in x_main
result =…
-
I reinstall `pip install flash-attn==2.6.1` in NGC pytorch docker image 24.06.
When I run train job, I got follow error:
```
Traceback (most recent call last):
File "/data1/nfs15/nfs/bigdata/zha…
-
### System Info
- `transformers` version: 4.44.0
- Platform: Linux-5.4.0-196-generic-x86_64-with-glibc2.31
- Python version: 3.12.0
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
Hi!
What is exact learning rate schedule as training iteration goes ?
- I am currently trying SFT, so set learning rate as 1e-5.
- Seems learning rate scheduler AnnealingLR of SAT module, but tri…