LimboWK commented 1 year ago

I have a customized SFT and evaluation scripts using QLora but I got GPU memory not enough problem in the evaluation steps, does anyone have the same issue or any insights on how to reduce the usage in the eval steps.

the trainer and dataset looks like this:

####################################################################### gradient_accumulation_steps = 4 per_device_train_batch_size = 4 per_device_eval_batch_size = 1 total_train_samples = len(train_data) total_validation_samples = len(validation_data) print(" Total training samples:", total_train_samples) print(" Total validation samples:", total_validation_samples)

num_train_steps_per_epoch = (total_train_samples // per_device_train_batch_size // gradient_accumulation_steps) print('* num_train_steps_per_epoch: ', num_train_steps_per_epoch) num_train_epochs = 1 max_steps = int(num_train_epochs num_train_steps_per_epoch) print(' Max steps:', max_steps)

trainer

trainer = transformers.Trainer( model=model, train_dataset=train_data, eval_dataset=validation_data, compute_metrics=compute_bleu_score, data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False), args=transformers.TrainingArguments( per_device_train_batch_size=per_device_train_batch_size, per_device_eval_batch_size=per_device_eval_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, warmup_steps=2, max_steps=max_steps, learning_rate=1e-4, evaluation_strategy="steps", eval_steps=50,
save_steps=50, logging_steps=10, save_total_limit=2, fp16=True, output_dir="outputs", optim="paged_adamw_8bit" ), ) model.config.use_cache = False

jonataslaw commented 1 year ago

per_device_train_batch_size = 1

ChenMnZ commented 9 months ago

I also encountered this problem. Did you solve it later?

artidoro / qlora

Question: CUDA memory usage in the evaluation phase #261

trainer