Open deepTransformer opened 4 years ago
You'll want to re-run with the following environment variable: CUDA_LAUNCH_BLOCKING=1
. This should give you a more informative error message.
You'll want to re-run with the following environment variable:
CUDA_LAUNCH_BLOCKING=1
. This should give you a more informative error message.
/opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/ATen/native/cuda/MultinomialKernel.cu:256: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [4,0,0], thread: [0,0,0] Assertion `sum > accZero` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/generic/THCTensorScatterGather.cu line=71 error=59 : device-side assert triggered
Traceback (most recent call last):
File ".//generate.py", line 11, in <module>
cli_main()
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq_cli/generate.py", line 269, in cli_main
main(args)
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq_cli/generate.py", line 36, in main
return _main(args, sys.stdout)
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq_cli/generate.py", line 145, in _main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq/tasks/fairseq_task.py", line 356, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/research/miniconda3/envs/torch1.4/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq/sequence_generator.py", line 161, in generate
return self._generate(sample, **kwargs)
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq/sequence_generator.py", line 310, in _generate
scores.view(bsz, beam_size, -1)[:, :, :step],
File "/home/research/haha/research-transfer-dialouge/fairseq/fairseq/search.py", line 257, in step
probs, dim=2, index=indices_buf.unsqueeze(-1)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/generic/THCTensorScatterGather.cu:71
I have same error.
Traceback (most recent call last):
File "reddit_lm.py", line 94, in <module>
output = reddit.predict(0, input(">>> : "))
File "reddit_lm.py", line 77, in predict
no_repeat_ngram_size=4,
File "/opt/conda/lib/python3.7/site-packages/fairseq/hub_utils.py", line 127, in sample
return self.sample([sentences], beam=beam, verbose=verbose, **kwargs)[0]
File "/opt/conda/lib/python3.7/site-packages/fairseq/hub_utils.py", line 129, in sample
batched_hypos = self.generate(tokenized_sentences, beam, verbose, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/hub_utils.py", line 170, in generate
generator, self.models, batch, **inference_step_args
File "/opt/conda/lib/python3.7/site-packages/fairseq/tasks/language_modeling.py", line 314, in inference_step
models, sample, prefix_tokens=prefix_tokens, bos_token=bos_token
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 177, in generate
return self._generate(sample, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 378, in _generate
original_batch_idxs,
File "/opt/conda/lib/python3.7/site-packages/fairseq/search.py", line 714, in step
replacement=True,
RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)
have the same problem
Is there any way to avoid this? For example, by adding some kind of modification to the logits before softmax.
This problem can be solved by assigning 1
to the eos element in fairseq/sequence_generator.py
.
Before:
# handle max length constraint
if step >= max_len:
lprobs[:, : self.eos] = -math.inf
lprobs[:, self.eos + 1 :] = -math.inf
After:
# handle max length constraint
if step >= max_len:
lprobs[:, : self.eos] = -math.inf
lprobs[:, self.eos + 1 :] = -math.inf
lprobs[:, self.eos] = 1
Have the same problem. Is there any solution that don't have to modify the source code of fairseq/sequence_generator.py?
🐛 Bug
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
processed_convai2_none/bin is the data path, checkpoint9.pt is the model I have trained
Code sample
Expected behavior
In the
sequence_generator.py
, when thestep == max_len
, thelprobs
will be assigned-math.inf
expecteos
, we wish the generator generate theeos
token.the following code
However,sometimes all elements of
lprobs
will be assigned-math.inf
includeeos
when running the code insearch.py
, I encountered the error, because all elements oflprobs
is 0 in the following code, we can't use torch.multinomial to sampleEnvironment
pip
, source): sourceAdditional context