-
When launching the nemo model in docker I get errors.
This issue was not present in the [pull request](https://github.com/ppisljar/Slovene_ASR_e2e/tree/master). The problem is probably in the pydant…
-
Im am currently trying to run a kfold trining loop. At the end of each iteration I free memory using `gc.collect()` and `torch.cuda.empty_cache()` but seems not to do the job completely. I leave the c…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Bug summary
A memory leak issue is observed in TorchIO version 20.0.1 during prolonged training sessions…
-
### Feature request
Good day. Can you add current learning rate to output of training process.
Now, if we train model with Trainer we get those output (without leraning rate):
I'm using th…
-
### Bug description
Would it be possible for Lightning to raise an error if `SLURM_NTASKS != SLURM_NTASKS_PER_NODE` in case both are set?
With a single node the current behavior is:
* `SLURM_NT…
-
hi all, i was giving the CPUOffloadOptimizer a try and found two issues when using with QLoRA single device in torchtune:
1. When using a LR scheduler i got. Maybe there is a way to inherit the opt…
-
### System Info
- `transformers` version: 4.46.3
- Platform: Linux-6.1.0-28-cloud-amd64-x86_64-with-glibc2.36
- Python version: 3.12.7
- Huggingface_hub version: 0.26.3
- Safetensors version: 0.4…
-
### System Info
Latest TRL from source, can't run TRL env rn as cluster is shut down but I'm installing everything from source.
If required will restart cluster and run.
### Information
- [ ] Th…
-
### Describe the bug
Getting an error when trying to train NER model using custom dataset. This was working back in Dec 2023. I have trained a model using the same data and FLAIR version 0.13.1 but…
-
### System Info
torch==2.4.0
transformers==4.43.4
trl==0.9.6
tokenizers==0.19.1
accelerate==0.32.0
peft==0.12.0
datasets==2.20.0
deepspeed==0.15.0
bitsandbytes==0.43.3
sentencepiece==0.2.0
…