beam size problem - Githubissues

maryawwm commented 2 years ago

Hi,

I trained the code with beam size of 1 and it worked well. Now I want to try it with other values but when I try beam size 3 in train script I got this error:

iter 2999 (epoch 0), train_loss = 0.770, time/batch = 0.202 250.90925693511963 ms needed to decode one sentece under batch size 10 and beam size 3 Traceback (most recent call last): File "train.py", line 325, in train(opt) File "train.py", line 273, in train dp_model, lw_model.crit, loader, eval_kwargs) File "/mnt/f/satic/eval_utils.py", line 138, in eval_split sents_list = [utils.decode_sequence(loader.getvocab(), ['seq'].unsqueeze(0))[0] for _ in model.done_beams[i]] File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 772, in getattr type(self).name, name)) torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'done_beams'

Can you help me how to fix that?(because you provided results with different beam size in your paper and I guess the code should be ok )

YuanEZhou commented 2 years ago

Hi @maryawwm , you can try to replace "_model.donebeams" with "_model.module.donebeams".

maryawwm commented 2 years ago

Thanks @YuanEZhou !It works.

maryawwm commented 2 years ago

hi again,

after changing the beam size when I want to run the second training stage I face new error :

iter 330103 (epoch 29), avg_reward = 0.000, time/batch = 0.975 Read data: 0.4967498779296875 Save ckpt on exception ... model saved to save/nsc-sat-2-from-nsc-seqkd/model.pth Save ckpt done. Traceback (most recent call last): File "train.py", line 213, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/mnt/f/satic/misc/loss_wrapper.py", line 45, in forward reward = get_self_critical_reward(self.model, fc_feats, att_feats, att_masks, gts, gen_result, self.opt) File "/mnt/f/satic/misc/rewards.py", line 42, in get_self_critical_reward greedyres, = model(fc_feats, att_feats, att_masks=att_masks, mode='sample') File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _callimpl result = self.forward(*input, **kwargs) File "/mnt/f/satic/models/CaptionModel.py", line 33, in forward return getattr(self, ''+mode)(*args, kwargs) File "/mnt/f/satic/models/SAT.py", line 396, in _sample p_fc_feats, p_att_feats, pp_att_feats, p_att_masks = self._prepare_feature(fc_feats, att_feats, att_masks) File "/mnt/f/satic/models/SAT.py", line 310, in _prepare_feature memory = self.model.encode(att_feats, att_masks) File "/mnt/f/satic/models/SAT.py", line 45, in encode return self.encoder(self.src_embed(src), src_mask) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/mnt/f/satic/models/SAT.py", line 86, in forward x = layer(x, mask) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/mnt/f/satic/models/SAT.py", line 128, in forward return self.sublayer[1](x, self.feed_forward) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/mnt/f/satic/models/SAT.py", line 114, in forward return x + self.dropout(sublayer(self.norm(x))) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, **kwargs) File "/mnt/f/satic/models/SAT.py", line 219, in forward return self.w_2(self.dropout(F.relu(self.w_1(x)))) File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/functional.py", line 1119, in relu result = torch.relu(input) RuntimeError: CUDA error: unknown error

Terminating BlobFetcher

YuanEZhou commented 2 years ago

Hi @maryawwm , we usually set beam size to 1 during training and set beam size to 3 during testing. This setting is ok and it is not very necessary to set beam size to 3 during training. Based on the above, I may not write the code in support of setting a beam size greater than 1 during the second training stage.

maryawwm commented 2 years ago

That's right. Thank you!

YuanEZhou / satic

beam size problem #3