Mrwangkedong / EVA2.0_fintune

中文预训练对话模型
0 stars 0 forks source link

关于eva_finetuning程序的运行问题 #3

Open zzzzhaozhao opened 1 year ago

zzzzhaozhao commented 1 year ago

您好,非常感谢您开源程序,但我在运行时遇到的这样的问题,请问您遇到过吗? Evaluating: 100%|████████████████████████████████████████████████████████████| 1988/1988 [38:49<00:00, 1.17s/it] Training: 14%|████████▌ | 179/1267 [41:45<4:13:48, 14.00s/it] Traceback (most recent call last): File "/home/EVA2.0_fintune/src/eva_finetuning.py", line 180, in train(args, model, tokenizer, optimizer, scheduler, train_dataloader, valid_dataloader, test_dataloader, device) File "/home/EVA2.0_fintune/src/eva_finetuning.py", line 127, in train valid_loss, valid_metric_res, generation_res = evaluate(args, tokenizer, eval_data_loader=valid_dataloader, model=model, device=device) File "/home/EVA2.0_fintune/src/eva_evaluate.py", line 118, in evaluate metricres, * = metric.close() File "/home/EVA2.0_fintune/src/generation_metrics.py", line 227, in close f1, scores = self.calc_unigram_f1() File "/home/EVA2.0_fintune/src/generation_metrics.py", line 133, in calc_unigram_f1 r = cross / len(ref) ZeroDivisionError: division by zero 我使用的是自己的数据集,每条数据只有单轮对话,无论我使用多少条数据,首先都会被一分为二,然后训练时都会卡在第二轮的179/****,报错如上,请问这是什么问题? 衷心感谢!

zzzzhaozhao commented 1 year ago

我大概知道问题出在哪里了,或许是eva_finetuning.sh里的这几个参数要和数据集对应吗? VALID_STEP_NUM=180 LOSS_STEP_NUM=120 BATCH_SIZE=2 EPOCHS=5

Mrwangkedong commented 1 year ago

STEP参数自己设定就好,这个可能是生成过程中生成了空字符串,导致验证时候出错?试着num_beam设置大于1试一下

zzzzhaozhao commented 1 year ago

感觉应该是在验证的时候多用了一组数据,因为我把参数改为 VALID_STEP_NUM=4 LOSS_STEP_NUM=120 BATCH_SIZE=4 EPOCHS=5 时,错误变成了 Evaluating: 100%|████████████████████████████████████████████████████████████| 2002/2002 [41:12<00:00, 1.24s/it] Training: 0%| | 3/4775 [45:43<1212:20:07, 914.59s/it] Traceback (most recent call last): File "/home/EVA2.0_fintune/src/eva_finetuning.py", line 184, in train(args, model, tokenizer, optimizer, scheduler, train_dataloader, valid_dataloader, test_dataloader, device) File "/home/EVA2.0_fintune/src/eva_finetuning.py", line 130, in train valid_loss, valid_metric_res, generation_res = evaluate(args, tokenizer, File "/home/EVA2.0_fintune/src/eva_evaluate.py", line 118, in evaluate metricres, * = metric.close() File "/home/EVA2.0_fintune/src/generation_metrics.py", line 227, in close f1, scores = self.calc_unigram_f1() File "/home/EVA2.0_fintune/src/generation_metrics.py", line 133, in calc_unigram_f1 r = cross / len(ref) ZeroDivisionError: division by zero 通过观察感觉是VALID_STEP_NUM=180时,多用了一组数据导致卡在179。而VALID_STEP_NUM=4时,多用了一组数据导致卡在3/****?

Mrwangkedong commented 1 year ago

看报错,这是在valid 生成完成后,计算 metrics时候的错误,好像与VALID_STEP_NUM并没有关系