模型训练问题 - Githubissues

Spico197 / DocEE

🕹️ A toolkit for document-level event extraction, containing some SOTA model implementations.

https://doc-ee.readthedocs.io/

MIT License

234 stars 36 forks source link

模型训练问题 #31

Closed sauceplus closed 2 years ago

sauceplus commented 2 years ago

Agreement

[x] Fill the space in brackets with x to check the agreement items.
[x] Before submitting this issue, I've fully checked the instructions in README.md.
[x] Before submitting this issue, I'd searched in the issue area and didn't find a solved issue that covers my problem.
[x] This issue is about the toolkit itself, not Python, pip or other programming basics.
[x] I understand if I do not check all the agreemnt items above, my issue MAY BE CLOSED OR REMOVED WITHOUT FURTHER EXPLANATIONS.

Problem

老师你好，我现在想重新训练ptpcg模型，运行run_ptpcg.sh发现我的电脑配置太低，所以准备申请云平台进行加速。我阅读了dee_task.py，现在我是否通过shell运行run_dee_task.py，就可以获得我想要的模型在Exps文件中？（不知道为啥，dee_task.train(save_cpt_flag=in_argv.save_cpt_flag)中的save_cpt_flag=False，意思是不保存模型吗？）

Spico197 commented 2 years ago

您好，save_cpt_flag=False 是指每轮评价前不保存模型权重。不过 run_ptpcg.sh 中设置了 save_best_cpt=True，即评价结束之后若当前轮次的模型结果大于历史最佳值时会保存最佳模型。这样可以降低硬盘空间的占用。不过您也可以把 save_cpt_flag 设置为True，从而每轮都保存一个模型权重。都可以。

sauceplus commented 2 years ago

QQ截图20220428210940 太巧妙了老师！如图，是不是到“if in_argv.run_inference:”，模型就已经完成训练并输出了呢？“run_inference”这一个if的操作意义是什么呢？

Spico197 commented 2 years ago

是的，到 run_inference 之前如果没有 skip_train 的话模型就已经训练完毕了；
run_inference 是为了打比赛做的接口，会读取inference_file_name文件，并在inference_dump_filepath中生成输出结果，具体可以看下相关代码。如果是在千言DuEE-fin数据集上做的实验，那么需要生成结果文件，经过dueefin_post_process.py处理后提交至系统进行线上评测。

sauceplus commented 2 years ago

原来是这样老师，看了半天inference_dump_filepath这个我以为老师又加入了什么测试方法呢hh {MF7O)V{{4O2C WZGR3JZKV 那既然这样，我的目的只是为了训练模型并输出，剩下我刷蓝的5个if我都可以去掉吧？

Spico197 commented 2 years ago

图中320行及之后的内容要保留，是输出最终评价结果用的。其它选中部分可以删除或按argparse中的相关设定设置。如果您使用的是run_ptpcg.sh文件运行，那么这些保留也没关系，不会被调用。

sauceplus commented 2 years ago

好滴老师。云平台无法使用shell，我准备将您shell中的参数转移到python脚本中。最后向您确认一下，我只要按run_ptpcg.sh文件的步骤进行训练，就可以用于预测的模型吧？（因为云平台要按小时付费，害怕训练费了。。）

Spico197 commented 2 years ago

是的。如果不放心的话您可以从train.json里抽48个文档，命名为sample_train_48.json，然后用stat.py处理一下，生成typed_sample_train_48.json文件。或者直接从typed_train.json文件中抽一些实例，把名字改成typed_sample_train_48.json。在确保Data目录中有typed_sample_train_48.json文件之后，把num_train_epochs设置成1，run_mode设置为"debug"，然后运行一下看看能不能正常跑通。这个测试只需要跑几分钟。

sauceplus commented 2 years ago

{76NF)$I2RU 1{38VG38MJG 老师您在这已经编写了debug的48个样例，我是否可以直接使用？

Spico197 commented 2 years ago

可以的

sauceplus commented 2 years ago

HYVMJQ2MP}HLPLOERB4@IIW 芜湖！感谢老师！！

sauceplus commented 2 years ago

老师再打扰一下，有如下问题： C6MHDT 7P`_ GN`4HT@ 40 RNWB25G${M4R J$)A(P@I2H

1.上图中从e1-e4，模型的loss从414降至1.82，请问loss是否可以正确反映模型的精度？ 2.Docee项目中，所有包含“inference”的元素，是不是都是老师用于参与比赛的，与模型训练无关？ HO6DV1{0`N6YQ873CH1{~@N