RichardHGL / WSDM2021_NSM

Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.
132 stars 22 forks source link

AssertionError: assert not torch.isnan(f2e_emb).any() #26

Open crooooked opened 12 months ago

crooooked commented 12 months ago

您好!我在运行python main_teacher.py的过程中在NSM/Modules/layer_nsm.py这个文件中的第56行报错:AssertionError ,原因是f2e_emb中含有NAN值。训练中第一次梯度更新之后,就报了该错。请问如何解决?

日志如下: Traceback (most recent call last): File "main_teacher.py", line 136, in main() File "main_teacher.py", line 124, in main trainer.train(0, args.num_epoch - 1) File "/home/zhangzy/WSDM2021_NSM-main/NSM/train/trainer_hybrid.py", line 83, in train loss, extras, h1_list_all, f1_list_all = self.train_epoch() File "/home/zhangzy/WSDM2021_NSM-main/NSM/train/trainer_hybrid.py", line 161, in trainepoch loss, extras, , tp_list = self.student(batch, training=True) File "/home/zhangzy/miniconda3/envs/MSQA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/zhangzy/WSDM2021_NSM-main/NSM/Agent/TeacherAgent.py", line 24, in forward return self.model(batch, training=training) File "/home/zhangzy/miniconda3/envs/MSQA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/zhangzy/WSDM2021_NSM-main/NSM/Model/hybrid_model.py", line 143, in forward self.init_reason(curr_dist=current_dist, local_entity=local_entity, File "/home/zhangzy/WSDM2021_NSM-main/NSM/Model/hybrid_model.py", line 53, in init_reason self.local_entity_emb = self.get_ent_init(local_entity, kb_adj_mat, self.rel_features) File "/home/zhangzy/WSDM2021_NSM-main/NSM/Model/base_model.py", line 124, in get_ent_init local_entity_emb = self.type_layer(local_entity=local_entity, File "/home/zhangzy/miniconda3/envs/MSQA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/zhangzy/WSDM2021_NSM-main/NSM/Modules/layer_nsm.py", line 56, in forward assert not torch.isnan(f2e_emb).any() AssertionError

RichardHGL commented 12 months ago

Hello, 我们之前一直没有碰到过这个问题,你可以先尝试使用我们提供的ckpt 也可以尝试减小学习率,或者输出梯度中的绝对值极大值观察异常

crooooked commented 12 months ago

Hello, 我们之前一直没有碰到过这个问题,你可以先尝试使用我们提供的ckpt 也可以尝试减小学习率,或者输出梯度中的绝对值极大值观察异常

感谢您的回复!请问提供的ckpt具体是在哪个位置?我在github项目中没有找到。

crooooked commented 12 months ago

Hello, 我们之前一直没有碰到过这个问题,你可以先尝试使用我们提供的ckpt 也可以尝试减小学习率,或者输出梯度中的绝对值极大值观察异常

感谢您的回复!请问提供的ckpt具体是在哪个位置?我在github项目中没有找到。

不好意思!找到ckpt了~