facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.37k stars 6.4k forks source link

Can't Generate from Pretrained Story Models #383

Closed hughbzhang closed 5 years ago

hughbzhang commented 5 years ago

I ran the following command from the examples stories tutorial using the pretrained checkpoints and couldn't get it to work. What is the correct command to generate from the pretrained story model? I saw https://github.com/pytorch/fairseq/issues/285 had a similar question, but I wasn't sure if it was resolved and if so what the correct command was.

python generate.py data-bin/writingPrompts --path data-bin/models/fusion_checkpoint.pt --batch-size 32 --beam 1 --sampling --sampling-topk 10 --sampling-temperature 0.8 --nbest 1 --model-overrides "{'pretrained_checkpoint':'data-bin/models/pretrained_checkpoint.pt'}"

My error is pasted below.


| [wp_target] dictionary: 104960 types
| data-bin/writingPrompts test 15138 examples
| ['data-bin/writingPrompts'] test 15138 examples
| loading model(s) from data-bin/models/fusion_checkpoint.pt
| loading pretrained model
  0%|                                         | 0/474 [00:00<?, ?it/s]Traceback (most recent call last):
  File "generate.py", line 171, in <module>
    main(args)
  File "generate.py", line 104, in main
    for sample_id, src_tokens, target_tokens, hypos in translations:
  File "/juicier/scr121/scr/hughz/fairseq/fairseq/sequence_generator.py", line 95, in generate_batched_itr
    prefix_tokens=s['target'][:, :prefix_size] if prefix_size > 0 else None,
  File "/juicier/scr121/scr/hughz/fairseq/fairseq/sequence_generator.py", line 117, in generate
    return self._generate(encoder_input, beam_size, maxlen, prefix_tokens)
  File "/juicier/scr121/scr/hughz/fairseq/fairseq/sequence_generator.py", line 143, in _generate
    encoder_out = model.encoder.reorder_encoder_out(encoder_out, new_order)
  File "/juicier/scr121/scr/hughz/fairseq/fairseq/models/composite_encoder.py", line 48, in reorder_encoder_out
    encoder_out[key] = self.encoders[key].reorder_encoder_out(encoder_out[key], new_order)
  File "/juicier/scr121/scr/hughz/fairseq/fairseq/models/fconv_self_att.py", line 231, in reorder_encoder_out
    eo.index_select(0, new_order) for eo in encoder_out['encoder_out']
  File "/juicier/scr121/scr/hughz/fairseq/fairseq/models/fconv_self_att.py", line 231, in <genexpr>
    eo.index_select(0, new_order) for eo in encoder_out['encoder_out']
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument #3 'index'```
huihuifan commented 5 years ago

Sorry, I have not seen this issue before. Which commit are you on?

huihuifan commented 5 years ago

Just saw this issue: https://github.com/pytorch/fairseq/issues/348

might be related? It seems resolved in this issue, could you take a look? Thank you!

hughbzhang commented 5 years ago

Ah I pulled to the most recent version and now it works. Thanks a bunch!

On Sun, Nov 25, 2018 at 5:20 AM Huihui Fan notifications@github.com wrote:

Just saw this issue: #348 https://github.com/pytorch/fairseq/issues/348

might be related? It seems resolved in this issue, could you take a look? Thank you!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/383#issuecomment-441440042, or mute the thread https://github.com/notifications/unsubscribe-auth/AEwWmt3Uot4c4B9Z8MUiBUy1gU2muiK5ks5uypkQgaJpZM4YxjxC .

NurmaU commented 5 years ago

I still have this problem, even after pulling latest version

frankang commented 5 years ago

Using the latest code, and executing the following command: CUDA_VISIBLE_DEVICES=1 python train.py data-bin/iwslt14.tokenized.de-en --lr 0.25 --clip-norm 0.1 \ --dropout 0.2 --max-tokens 4000 --arch trarmer_iwslt_de_en \ --save-dir checkpoints/transformer --log-interval 200 --no-progress-bar --save-interval-updates 600

when restoring from an unfinished checkpoint (that is, checkpints beside the epoch checkpoints determined by the --save-interval-updates option), it would expect the following error: File "/home/frank/fairseq-py/fairseq/trainer.py", line 250, in train_step self.meters['gnorm'].update(grad_norm) File "/home/frank/fairseq-py/fairseq/meters.py", line 24, in update self.sum += val * n RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'other' It runs fine if restoring fromthe epoch checkpoint file.

And when executing the generate.py file, with code CUDA_VISIBLE_DEVICES=1 python generate.py data-bin/iwslt14.tokenized.en-de --path checkpoints/transformer/checkpoint_best.pt --beam 5 --batch-size 128 --quiet I can see the following error as in previous posts. File "/home/frank/fairseq-py/fairseq/models/transformer.py", line 342, in reorder_encoder_out encoder_out['encoder_out'].index_select(1, new_order) RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument #3 'index'

frankang commented 5 years ago

upgrading to pytorch 0.4.1 solved the above problem.

myleott commented 5 years ago

I think the attached commit (#393) should also fix this on 0.4.0, right? The problem is that arange on 0.4.0 returns a FloatTensor, whereas from 0.4.1 forward it returns a LongTensor.