Closed aaaallleen closed 1 week ago
I found a fix to this, sent the tensors blank, fill and score to device will fix this. Change OpenNMT-py/onmt/modules/copy_generator.py line 24 to 30 to:
if blank:
blank = torch.Tensor(blank).to(torch.int64).to(scores.device)
fill = torch.Tensor(fill).to(torch.int64).to(scores.device)
score = scores[:, b] if batch_dim == 1 else scores[b]
score = score.to(score.device)
score.index_add_(1, fill, score.index_select(1, blank))
score.index_fill_(1, blank, 1e-10)
This should fix the issue, I feel like this is a temporary fix and is not the best solution. Should I open a PR for this issue?
Please read the README of the project, we are no longer supporting OpenNMT-py and switching to https://github.com/eole-nlp/eole However bear in mind that we dropped copy attention in EOLE, it does not bring improvement especially with transformers. I suggest you to switch to eole if you intend to get support in the future.
Oh, thank you. I noticed that the performance didn't improve, after fixing the issue. Thank you for your work!
I tried training OpenNMT-py with the following configs, with copy_attn set to False, everything trains fine. But when I set copy_attn to True, it produces the following log
The config:
This is the error log it produces
I tried looking into the code but the model.to(device) was performed. Any reason why this could be happening? Thank you.