memory-trainer Search Results

YuHengsss/YOLOV #109

AttributeError: 'Exp' object has no attribute 'pre_no_aug'

2024-11-05 09:04:41 | INFO | yolox.core.vid_trainer:240 - ---> start train epoch1 2024-11-05 09:04:41 | INFO | yolox.core.vid_trainer:235 - Training of experiment is done and the best AP is 0…

leilanfang updated 2 days ago

huggingface/transformers #33717

Trainer class causes massive memory leak when using mps

### System Info - `transformers` version: 4.44.2 - Platform: macOS-14.4-arm64-arm-64bit - Python version: 3.12.2 - Huggingface_hub version: 0.24.5 - Safetensors version: 0.4.3 - Accelerate versi…

JamesBowerXanda updated 1 week ago

Doubiiu/DynamiCrafter #141

CUDA Out of Memory Error on a 32GB GPU when Running trainer.…

**Description**: Hello, I encountered a `torch.cuda.OutOfMemoryError` while fine-tuning a model using `trainer.py`. My setup includes only a single GPU with 32GB of memory, and the error occurs eve…

xlnn updated 1 week ago

modelscope/ms-swift #2416

训练正常进行但保存检查点时出现 OOM

## 环境信息 - GPU：A100 - 显存：40G - SWIFT版本：v2.5.2 ## 训练脚本 ``` CUDA_VISIBLE_DEVICES=0 PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" swift sft \ --model_type llama3_2-11b-vision-instruct …

mirrorange updated 9 hours ago

Lightning-AI/pytorch-lightning #20299

Incosistant memory usage comparing to huggingface trainer wh…

### Bug description I was able to fine-tune a 8B LLM using Huggingface training framework with PEFT+DeepSpeed stage 2 under fp16 precision(mixed precision training). Recently I would like to change…

mickeysun0104 updated 1 month ago

qubic-li/client #112

FATAL: GLIBC Version must by >= 2.34. Trainer can't be start…

Hello. I have been using qli-Client 2.2.1 for quite some time without any problems. I am running it on wsl2 of windows 11 with an ubuntu 22.04.5. I mine with my nvidia 4090 GPU. For the drivers,…

Cedyy updated 1 hour ago

Toni-SM/skrl #219

Example error caused by create_tensor(... keep_dimensions) d…

### Description Hello, when I run the example files provided by the [official document] (https://github.com/Toni-SM/skrl/blob/main/docs/source/examples/deepmind/dm_manipulation_stack_sac.py), an erro…

NoneJou072 updated 4 days ago

unslothai/unsloth #1105

Tied weights like Llama 3.2 3B cannot save during checkpoint…

Hello, I tried to train Llama3.2 3B. It's a full finetune, not a lora, but Unsloth always crashes under varying conditions when the model should be saved. Hardware was runpod in all cases, different c…

kovern updated 2 weeks ago

SylphAI-Inc/AdalFlow #246

brittle parser in tgd_optimizer

**Describe the bug** ``` TypeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 trainer.fit( 2 train_dataset=filtered_dataset.train, …

mrdrprofuroboros updated 1 week ago

unslothai/unsloth #1230

Why is memory bandwidth only half used? Is it possible we sp…

Hi thanks for the library! This is like a discussion (instead of an issue). It seems that when using unsloth or huggingface Trainer to full finetune ~1B model, the gpu utilization is >90%, while memor…

fzyzcjy updated 1 week ago

1000+ results for memory-trainer

1000+ results
for memory-trainer