gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

eosphoros-ai/DB-GPT-Hub #269

预测阶段：poetry run sh ./dbgpt_hub/scripts/predict_sft.sh，Killed

- 环境： - WLS-2，Ubuntu22.04， 4090 GPU x1 - train_sft.sh ```bash CUDA_VISIBLE_DEVICES=0 python dbgpt_hub/train/sft_train.py \ --model_name_or_path $model_name_or_path \ --quantizati…

GuokaiLiu updated 3 weeks ago
2
pytorch/pytorch #107575

halo，I continue pretrain llama2-13B model ，but save state_di…

### 🐛 Describe the bug @record def training_function(args): # get some base rank info # metric = evaluate.load("glue", "mrpc") world_size = os.getenv("WORLD_SIZE") rank = os.gete…

moonmon updated 1 year ago
2
pytorch/torchtitan #292

[Feature] Add gradient accumulation

Gradient accumulation (micro step) could be very useful when we want to have large batch size but with limited number of gpus.

XinDongol updated 2 days ago
16
callsys/GenPromp #11

The parameter `test.combine_ratio` seem invalid when running…

Hello, The problem is that the parameter `w` or `"{'test': {'combine_ratio': 0.6}}"` in the readme.md doesn't seem to work when running inference. I tried setting values 0, 1, 0.1, 0.9 and compa…

KevinLi-167 updated 6 months ago
5
Mikubill/naifu #36

When strategy deepspeed, the key erro of zeRO will be error,…

![NG@T{Q JDW3%OVV{5 {04OL](https://github.com/user-attachments/assets/188f0cbc-32e6-4a60-94ad-0b44fdd752a9) When we perform multi-machine multi-GPU training, we are prompted with an out-of-memory err…

X-MAXXIX updated 2 weeks ago
6
FartyPants/Training_PRO #14

recommended that you use use_reentrant=False

I see below warning in logs when running a LoRA training , can this be ignored ? `/text-generation-webui-main/installer_files/env/lib/python3.11/site-packages/torch/utils/checkpoint.py:429: UserWar…

sandyis updated 10 months ago
1
haotian-liu/LLaVA #1554

[Usage] Not able to fine tune the LLaVA model with llava-v1…

### Describe the issue Issue: Not able to fine tune the LLaVA model with llava-v1.5-7b. also I am sharing my arguments below here so when I am running the code it gives me error as size m…

ayushgupta9198 updated 1 month ago
6
Loco-Roco/DiffAD #3

i cannot find max_epochs.

{'_default_root_dir': '/data/DJL/DiffAD-main', '_fit_loop': , '_is_data_prepared': False, '_lightning_optimizers': None, '_predict_loop': , '_progress_bar_callback': , '_stochastic_weight_avg': …

RNA429 updated 6 months ago
2
kamalkraj/e5-mistral-7b-instruct #6

Am I using the code incorrectly? help me

I used this code and trained with Korean ko-snil data. adapter_config.json, adapter_model.safetensors, special_tokens_map.json, tokenizer_config.json, tokenizer.json, tokenizer.model 5 files wer…

Pang-dachu updated 5 months ago
14
karpathy/nanoGPT #421

Question about vocab size

First, thank you for creating nanoGPT. It has been an amazing learning experience! I have a question about vocab size and training. I have built nanoGPT and ran the Shakespeare data with a vocab size …

ArtHughes updated 9 months ago
1

上一页 1...79 80 81 82 83 84 85...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation