Closed hughbzhang closed 5 years ago
Sorry, I have not seen this issue before. Which commit are you on?
Just saw this issue: https://github.com/pytorch/fairseq/issues/348
might be related? It seems resolved in this issue, could you take a look? Thank you!
Ah I pulled to the most recent version and now it works. Thanks a bunch!
On Sun, Nov 25, 2018 at 5:20 AM Huihui Fan notifications@github.com wrote:
Just saw this issue: #348 https://github.com/pytorch/fairseq/issues/348
might be related? It seems resolved in this issue, could you take a look? Thank you!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/383#issuecomment-441440042, or mute the thread https://github.com/notifications/unsubscribe-auth/AEwWmt3Uot4c4B9Z8MUiBUy1gU2muiK5ks5uypkQgaJpZM4YxjxC .
I still have this problem, even after pulling latest version
Using the latest code, and executing the following command:
CUDA_VISIBLE_DEVICES=1 python train.py data-bin/iwslt14.tokenized.de-en --lr 0.25 --clip-norm 0.1 \ --dropout 0.2 --max-tokens 4000 --arch trarmer_iwslt_de_en \ --save-dir checkpoints/transformer --log-interval 200 --no-progress-bar --save-interval-updates 600
when restoring from an unfinished checkpoint (that is, checkpints beside the epoch checkpoints determined by the --save-interval-updates option), it would expect the following error:
File "/home/frank/fairseq-py/fairseq/trainer.py", line 250, in train_step self.meters['gnorm'].update(grad_norm) File "/home/frank/fairseq-py/fairseq/meters.py", line 24, in update self.sum += val * n RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'other'
It runs fine if restoring fromthe epoch checkpoint file.
And when executing the generate.py file, with code
CUDA_VISIBLE_DEVICES=1 python generate.py data-bin/iwslt14.tokenized.en-de --path checkpoints/transformer/checkpoint_best.pt --beam 5 --batch-size 128 --quiet
I can see the following error as in previous posts.
File "/home/frank/fairseq-py/fairseq/models/transformer.py", line 342, in reorder_encoder_out encoder_out['encoder_out'].index_select(1, new_order) RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument #3 'index'
upgrading to pytorch 0.4.1 solved the above problem.
I think the attached commit (#393) should also fix this on 0.4.0, right? The problem is that arange on 0.4.0 returns a FloatTensor, whereas from 0.4.1 forward it returns a LongTensor.
I ran the following command from the examples stories tutorial using the pretrained checkpoints and couldn't get it to work. What is the correct command to generate from the pretrained story model? I saw https://github.com/pytorch/fairseq/issues/285 had a similar question, but I wasn't sure if it was resolved and if so what the correct command was.
My error is pasted below.