typing-trainer Search Results

clarinsi/Slovene_ASR_e2e #6

Error when launching on docker

When launching the nemo model in docker I get errors. This issue was not present in the [pull request](https://github.com/ppisljar/Slovene_ASR_e2e/tree/master). The problem is probably in the pydant…

SninaH updated 1 week ago

huggingface/setfit #567

Trainer does not release all CUDA memory

Im am currently trying to run a kfold trining loop. At the end of each iteration I free memory using `gc.collect()` and `torch.cuda.empty_cache()` but seems not to do the job completely. I leave the c…

lopozz updated 1 week ago

fepegar/torchio #1222

Memory leak with TorchIO 0.20.1

### Is there an existing issue for this? - [X] I have searched the existing issues ### Bug summary A memory leak issue is observed in TorchIO version 20.0.1 during prolonged training sessions…

FlorianScalvini updated 1 month ago

huggingface/transformers #27631

Add the learning rate (in exponential representation, like "…

### Feature request Good day. Can you add current learning rate to output of training process. Now, if we train model with Trainer we get those output (without leraning rate): I'm using th…

artyomboyko updated 3 months ago

Lightning-AI/pytorch-lightning #20391

Error if SLURM_NTASKS != SLURM_NTASKS_PER_NODE

### Bug description Would it be possible for Lightning to raise an error if `SLURM_NTASKS != SLURM_NTASKS_PER_NODE` in case both are set? With a single node the current behavior is: * `SLURM_NT…

guarin updated 2 weeks ago

pytorch/ao #1209

CPUoffloadOptimizer issues

hi all, i was giving the CPUOffloadOptimizer a try and found two issues when using with QLoRA single device in torchtune: 1. When using a LR scheduler i got. Maybe there is a way to inherit the opt…

felipemello1 updated 2 weeks ago

huggingface/transformers #35086

CausalLM loss function throws runtime error in multi-gpu set…

### System Info - `transformers` version: 4.46.3 - Platform: Linux-6.1.0-28-cloud-amd64-x86_64-with-glibc2.36 - Python version: 3.12.7 - Huggingface_hub version: 0.26.3 - Safetensors version: 0.4…

xspirus updated 6 hours ago

huggingface/trl #2215

[GKD] mismatch in tensors when stacking log probs

### System Info Latest TRL from source, can't run TRL env rn as cluster is shut down but I'm installing everything from source. If required will restart cluster and run. ### Information - [ ] Th…

nivibilla updated 1 month ago

flairNLP/flair #3473

[Bug]: TypeError: 'Token' object is not subscriptable

### Describe the bug Getting an error when trying to train NER model using custom dataset. This was working back in Dec 2023. I have trained a model using the same data and FLAIR version 0.13.1 but…

kdk2612 updated 5 months ago

huggingface/trl #2250

OOM when unwrap_model_for_generation

### System Info torch==2.4.0 transformers==4.43.4 trl==0.9.6 tokenizers==0.19.1 accelerate==0.32.0 peft==0.12.0 datasets==2.20.0 deepspeed==0.15.0 bitsandbytes==0.43.3 sentencepiece==0.2.0 …

hlnchen updated 2 weeks ago

1000+ results for typing-trainer

1000+ results
for typing-trainer