gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeedExamples #630

much more memory used than In theory

I am training [glm-10b-chinese](https://huggingface.co/THUDM/glm-10b-chinese/blob/main/config.json) for step-1. In theory, 10b paramters, fp32, total memory occupied should be : * params : 40GB * …

pandaGst updated 1 year ago
4
microsoft/DeepSpeed #3938

[BUG] model.load_checkpoint out of memory

### System Info ```Shell accelerate 0.20.3 python 3.10 numpy 1.24.3 torch 2.0.1 accelerate config: compute_environment: LOCAL_MACHINE deepspeed_config: deepspeed_multinode_launcher: stand…

jiangix-paper updated 5 months ago
7
ArtificialZeng/Qwen-Tuning #1

单机多卡训练报错

你好，使用命令进行单机多卡训练的时候报错。命令如下： CUDA_VISIBLE_DEVICES=0,1,2,3 python src/train_bash.py \ --stage sft \ --model_name_or_path path_to_your_model \ --do_train \ --dataset alpaca_gpt4_zh \ …

nicole828 updated 8 months ago
4
yuhuixu1993/qa-lora #18

How to set batchsize

Thanks for your great job. In your paper, the batch size is 16 in the tunning, how to set the batchsize as 16, change per_device_train_batch_size value from default 1 as 16?

StiphyJay updated 11 months ago
1
InternLM/InternLM-XComposer #204

InternLM-XComposer2-VL-7B使用lora微调，似乎保存了整个模型？

我在用lora微调的时候，发现它保存的目录下有一个文件 mp_rank_00_model_states.pt，有32GB，还有一个bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt，有900M。有点困惑，lora应该只保存它训练的那部分参数才对。我的finetune_lora.sh如下： ```shell #!/bin/bash expo…

handsomecaoyu updated 7 months ago
3
bmaltais/kohya_ss #2660

ValueError: num_samples should be a positive integer value, …

``` Traceback (most recent call last): File "/workspace/kohya_ss/sd-scripts/train_db.py", line 529, in train(args) File "/workspace/kohya_ss/sd-scripts/train_db.py", line 190, in train …

popovidis updated 2 months ago
3
Lightning-AI/lit-llama #372

Fixing the pretrain script for Loss Averaging and no_backwar…

I think #357 should be applied to the pretrain script as well. Thank you so much lightning team for this amazing repository.

LamOne1 updated 1 year ago
2
dvlab-research/LongLoRA #168

the value of loss is too unstable when supervised-finetune t…

when I use the LongAlpaca-12k dataset to supervised fintune the LongAlpaca-7B model, the value of loss is too unstable. my command is : ``` Miniconda/envs/longlora/bin/python -u supervised-fine-tun…

seanxuu updated 9 months ago
1
lrzjason/T2ITrainer #9

Error when starting train_hunyuan_lora_ui.py

Trying to run train_hunyuan_lora_ui.py, and getting the following error: ``` log python train_hunyuan_lora_ui.py --seed 12151004 --logging_dir logs --mixed_precision bf16 --report_to wandb --lr_wa…

a-l-e-x-d-s-9 updated 3 months ago
1
TheLastBen/fast-stable-diffusion #1435

Received this error this morning with my training.

Training the UNet... '########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######::: ... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##:: ::: ##:::: ##:::: ##::'#…

DarkAlchy updated 1 year ago
5

上一页 1...84 85 86 87 88 89 90...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation