-
### System Info
Output from `transformers-cli env`:
```
- `transformers` version: 4.45.2
- Platform: Linux-6.1.0-21-cloud-amd64-x86_64-with-glibc2.36
- Python version: 3.12.5
- Huggingfa…
-
## 🐛 Bug
Hi, we are using lightning with litdata on our local machine and aws s3 system. However, training would hang randomly during the very first iterations with ddp and remote cloud directory.
…
-
Excellent work!
I have completed Preprocessing: Estimate the SMPL-X body mesh using ECON and segment the input image into RGBA format
And use Python run. py -- config config/humanref. yaml -- trai…
-
Hello!
I absolutely love sight reading trainer, but I notice that the only way to create a new play along-song is by inputting each note with LML. I don't know anything about coding, so I don't kn…
-
**Describe the bug**
4xA100 gpu fine-tuning llama-3.1-8b-instruct (also tried llama2-13b-ms, same error), cli
```
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
LOCAL_WORLD_SIZE=4 \
swift…
-
**Describe the bug**
I’m experiencing an issue when fine-tuning the Llama-2-7b model from Hugging Face with Zero optimization enabled. I am running on 8 Intel Max 1550 GPUs using the code from the exa…
-
### Bug description
Hello,
i am using `SLURMEnvironment` plugin to resubmit jobs automatically. So far it has been working seamlessly
on my academic cluster, but recently when the auto-requeue si…
-
### Bug description
Running with the LightningCLI, MLflow logger, and `MLFLOW_TRACKING_URI` environment variable set causes an assertion failure with logging. I think using a remote tracking server…
-
## 🐛 Bug Description
## To Reproduce
Steps to reproduce the behavior:
1. **from qlib.rl.trainer import Checkpoint, train**
1.
1.
## Expected Behavior
## Screenshot
##…
-
### Bug description
I'm trying to use LightningCLI to configure my code from the command line but LightningCLI is seeming to have trouble parsing the default logger of the trainer when I run the fo…