OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.39k stars 248 forks source link

Bugs in /models/sequence_generator.py when reproducing the finetuning of image captioning #353

Closed Eccentric666 closed 1 year ago

Eccentric666 commented 1 year ago

Hi, thank you for your great work!

I had successfully run the finetuning for image captioning at stage1. However, when I tried to continue to reproduce with your tutorial at stage2, i.e. nohup sh train_caption_stage2.sh > train_stage2.out & # stage 2, load the best ckpt of stage1 and train with CIDEr optimization, I was trapped with some bugs.

Here is the problem.

Traceback (most recent call last):
  File "../../train.py", line 537, in <module>
    cli_main()
  File "../../train.py", line 530, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/media/yi/D/python_vs/OFA-main/fairseq/fairseq/distributed/utils.py", line 389, in call_main
    main(cfg, **kwargs)
  File "../../train.py", line 199, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/home/yi/anaconda3/envs/ofa/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "../../train.py", line 310, in train
    log_output = trainer.train_step(samples)
  File "/home/yi/anaconda3/envs/ofa/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/media/yi/D/python_vs/OFA-main/trainer.py", line 780, in train_step
    **extra_kwargs,
  File "/media/yi/D/python_vs/OFA-main/tasks/ofa_task.py", line 334, in train_step
    loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)
  File "/home/yi/anaconda3/envs/ofa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/media/yi/D/python_vs/OFA-main/criterions/scst_loss.py", line 88, in forward
    loss, score, ntokens, nsentences = self.compute_loss(model, sample, reduce=reduce)
  File "/media/yi/D/python_vs/OFA-main/criterions/scst_loss.py", line 239, in compute_loss
    gen_target, gen_res, gt_res = self.get_generator_out(model, sample)
  File "/media/yi/D/python_vs/OFA-main/criterions/scst_loss.py", line 149, in get_generator_out
    gen_out = self.task.scst_generator.generate([model], sample)
  File "/home/yi/anaconda3/envs/ofa/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/media/yi/D/python_vs/OFA-main/models/sequence_generator.py", line 207, in generate
    return self._generate(models, sample, **kwargs)
  File "/media/yi/D/python_vs/OFA-main/models/sequence_generator.py", line 490, in _generate
    assert step < max_len, f"{step} < {max_len}"
AssertionError: 16 < 16

I noted that line 490, i.e. assert step < max_len, f"{step} < {max_len}" is in the FOR loop at line 335, i.e. for step in range(max_len + 1): # one extra step for EOS marker. So, how can the line 490 work properly at the last loop?

Thank you for your early reply and suggestions! :)

Eccentric666 commented 1 year ago

Some mistakes. I used OFA_base model. In stage 1, I used nohup sh train_caption_stage1_base.sh > train_stage1.out & In stage 2, I used nohup sh train_caption_stage2_base.sh > train_stage2.out &

maryhh commented 1 year ago

add --freeze-resnet

Eccentric666 commented 1 year ago

add --freeze-resnet

Thank you for your suggestion! It works!