train error I can't handle out

ZJULearning / ReDR

Code for ACL 2019 paper "Reinforced Dynamic Reasoning for Conversational Question Generation".

41 stars 15 forks source link

train error I can't handle out #3

Open TingFree opened 5 years ago

TingFree commented 5 years ago

I run your code in google colab, but I meet a crucial error and I can't deal with it by google, maybe you can help me, thanks. ps: all packages you mention is installed.

[2019-09-14 02:57:15,574 INFO]  * src vocab size = 49766
[2019-09-14 02:57:15,574 INFO]  * history vocab size = 49766
[2019-09-14 02:57:15,574 INFO]  * tgt vocab size = 49766
[2019-09-14 02:57:15,574 INFO] Building model...
Traceback (most recent call last):
  File "/content/drive/My Drive/ReDR/train.py", line 109, in <module>
    main(opt)
  File "/content/drive/My Drive/ReDR/train.py", line 41, in main
    single_main(opt, -1)
  File "/content/drive/My Drive/ReDR/onmt/train_single.py", line 86, in main
    model = build_model(model_opt, opt, fields, checkpoint)
  File "/content/drive/My Drive/ReDR/onmt/model_builder.py", line 235, in build_model
    vocab = torch.load(opt.drqa_vocab_path)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 574, in _load
    result = unpickler.load()
AttributeError: Can't get attribute '_default_unk_index' on <module 'torchtext.vocab' from '/usr/local/lib/python3.6/dist-packages/torchtext/vocab.py'>

TingFree commented 5 years ago

in fact， when I torch.load the 'drqa_param/vocab.pt', the same exception will be reported. Is the pt file wrong? The test code as follow:

import torch
vocab = torch.load('./drqa_param/vocab.pt')

Traceback (most recent call last):
  File "G:/NLP/code/cqg_ReDR/just_test.py", line 2, in <module>
    vocab = torch.load('./drqa_param/vocab.pt')
  File "D:\Python\lib\site-packages\torch\serialization.py", line 368, in load
    return _load(f, map_location, pickle_module)
  File "D:\Python\lib\site-packages\torch\serialization.py", line 542, in _load
    result = unpickler.load()
AttributeError: Can't get attribute '_default_unk_index' on <module 'torchtext.vocab' from 'D:\\Python\\lib\\site-packages\\torchtext\\vocab.py'>

kouhonglady commented 5 years ago

Have you solved this problem？ I have encountered the same problem as you, and I don't know how to solve it. help~

TingFree commented 5 years ago

@kouhonglady when I change the torchtext version to 0.3.1, another error was reported. So I guess it maybe the version's problem. After that, I was too busy participating in other competitions to continue my investigation. If you solve this problem, please tell me in time, or I will share you my method when I run this code perfect in the near future. I hope so~

kouhonglady commented 5 years ago

I solved this problem using torchtext version to 0.3.1 and python 2.7

AlbertChen1991 commented 4 years ago

@kouhonglady @Dingjiajie Hi, have you corrected the errors?

AlbertChen1991 commented 4 years ago

Hello, @kouhonglady @Dingjiajie . I have fixed the error in python 3.7. Just install the torchtext==0.4.0, and add the following code into the model_builder.py file.

from torchtext import vocab try: vocab._default_unk_index except AttributeError: def _default_unk_index(): return 0 vocab._default_unk_index = _default_unk_index

ZhiliWang commented 4 years ago

Hello, @kouhonglady @Dingjiajie . I have fixed the error in python 3.7. Just install the torchtext==0.4.0, and add the following code into the model_builder.py file.

from torchtext import vocab try: vocab._default_unk_index except AttributeError: def _default_unk_index(): return 0 vocab._default_unk_index = _default_unk_index

@AlbertChen1991

This solves OP's issue but triggered another runtime error for me:

/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). var = torch.tensor(arr, dtype=self.dtype, device=device) Traceback (most recent call last): File "train.py", line 109, in main(opt) File "train.py", line 41, in main single_main(opt, -1) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/train_single.py", line 134, in main valid_steps=opt.valid_steps) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/trainer.py", line 217, in train report_stats, local_step) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/trainer.py", line 348, in _gradient_accumulation outputs, attns, results = self.model(batch, src, history, tgt, src_lengths, history_lengths, bptt=bptt) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/models/model.py", line 70, in forward memory_lengths=src_lengths) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/decoders/decoder.py", line 210, in forward tgt, memory_bank, memory_lengths=memory_lengths) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/decoders/decoder.py", line 388, in _run_forward_pass memory_lengths=memory_lengths) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/modules/global_attention.py", line 183, in forward align.maskedfill(1 - mask, -float('inf')) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 394, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead.

AlbertChen1991 commented 4 years ago

just follow the suggestion，use ~musk instead of 1-musk 765276444 Email:765276444@qq.com Signature is customized by Netease Mail Master On 02/28/2020 11:46, Zhili Wang wrote: Hello, @kouhonglady @Dingjiajie . I have fixed the error in python 3.7. Just install the torchtext==0.4.0, and add the following code into the model_builder.py file. from torchtext import vocab try: vocab._default_unk_index except AttributeError: def _default_unk_index(): return 0 vocab._default_unk_index = _default_unk_index @AlbertChen1991 This solves OP's issue but triggered another runtime error for me: /opt/anaconda3/envs/py36/lib/python3.6/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). var = torch.tensor(arr, dtype=self.dtype, device=device) Traceback (most recent call last): File "train.py", line 109, in main(opt) File "train.py", line 41, in main single_main(opt, -1) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/train_single.py", line 134, in main valid_steps=opt.valid_steps) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/trainer.py", line 217, in train report_stats, local_step) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/trainer.py", line 348, in _gradient_accumulation outputs, attns, results = self.model(batch, src, history, tgt, src_lengths, history_lengths, bptt=bptt) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/models/model.py", line 70, in forward memory_lengths=src_lengths) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/decoders/decoder.py", line 210, in forward tgt, memory_bank, memory_lengths=memory_lengths) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/decoders/decoder.py", line 388, in run_forward_pass memory_lengths=memory_lengths) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/modules/global_attention.py", line 183, in forward align.masked_fill(1 - mask, -float('inf')) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 394, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

AlbertChen1991 commented 4 years ago

use ~mask instead of 1-mask 765276444 Email:765276444@qq.com Signature is customized by Netease Mail Master On 02/29/2020 08:48, 765276444@qq.com wrote: just follow the suggestion，use ~musk instead of 1-musk 765276444 Email:765276444@qq.com Signature is customized by Netease Mail Master On 02/28/2020 11:46, Zhili Wang wrote: Hello, @kouhonglady @Dingjiajie . I have fixed the error in python 3.7. Just install the torchtext==0.4.0, and add the following code into the model_builder.py file. from torchtext import vocab try: vocab._default_unk_index except AttributeError: def _default_unk_index(): return 0 vocab._default_unk_index = _default_unk_index @AlbertChen1991 This solves OP's issue but triggered another runtime error for me: /opt/anaconda3/envs/py36/lib/python3.6/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). var = torch.tensor(arr, dtype=self.dtype, device=device) Traceback (most recent call last): File "train.py", line 109, in main(opt) File "train.py", line 41, in main single_main(opt, -1) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/train_single.py", line 134, in main valid_steps=opt.valid_steps) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/trainer.py", line 217, in train report_stats, local_step) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/trainer.py", line 348, in _gradient_accumulation outputs, attns, results = self.model(batch, src, history, tgt, src_lengths, history_lengths, bptt=bptt) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/models/model.py", line 70, in forward memory_lengths=src_lengths) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/decoders/decoder.py", line 210, in forward tgt, memory_bank, memory_lengths=memory_lengths) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/decoders/decoder.py", line 388, in run_forward_pass memory_lengths=memory_lengths) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/Users/zhiliwang/Documents/nlp/ReDR/onmt/modules/global_attention.py", line 183, in forward align.masked_fill(1 - mask, -float('inf')) File "/opt/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 394, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dxlong2000 commented 2 years ago

@AlbertChen1991 Thanks for your support.

After fixing the 1 - mask I got another error as below. Could you suggest any solution? Thank you very much!

Traceback (most recent call last):
  File "train.py", line 109, in <module>
    main(opt)
  File "train.py", line 41, in main
    single_main(opt, -1)
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/train_single.py", line 134, in main
    valid_steps=opt.valid_steps)
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/trainer.py", line 217, in train
    report_stats, local_step)
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/trainer.py", line 348, in _gradient_accumulation
    outputs, attns, results = self.model(batch, src, history, tgt, src_lengths, history_lengths, bptt=bptt)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/models/model.py", line 72, in forward
    results = self.translator.translate_batch(batch, vocabs, False, training=self.train())
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/translate/translator.py", line 539, in translate_batch
    training=training)
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/translate/translator.py", line 695, in _translate_batch
    beam.advance(log_probs, attn, dec_out_attn)
  File "/content/gdrive/MyDrive/A/factoid_one_focus/CoQA/baselines/ReDR/onmt/translate/beam_search.py", line 182, in advance
    [self.alive_seq.index_select(0, self.select_indices),
RuntimeError: index_select(): Expected dtype int32 or int64 for index