YuanEZhou / satic

Other
25 stars 2 forks source link

the code does not convert IntTensor to LongTensor #2

Open maryawwm opened 3 years ago

maryawwm commented 3 years ago

I'm gonna train this code with the same environmental requirements: python 3.6 pytorch 1.6

but when I run the first training stage I got error:

DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/mscoco/cocobu_fc data/mscoco/cocobu_att data/mscoco/cocobu_box data/cocotalk_seq-kd-from-nsc-transformer-baseline-b5_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test Read data: 0.046845197677612305 Save ckpt on exception ... model saved to save/sat-2-from-nsc-seqkd\model.pth Save ckpt done. Traceback (most recent call last): File "train.py", line 213, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag).to(device).long() File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply output.reraise() File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker output = module(*input, *kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "C:\Users\vision\satic\misc\loss_wrapper.py", line 30, in forward student_output = self.model(fc_feats, att_feats, labels, att_masks) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _callimpl result = self.forward(*input, **kwargs) File "C:\Users\vision\satic\models\CaptionModel.py", line 33, in forward return getattr(self, ''+mode)(*args, kwargs) File "C:\Users\vision\satic\models\SAT.py", line 347, in _forward out = self.model(att_feats, seq, att_masks, seq_mask) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "C:\Users\vision\satic\models\SAT.py", line 42, in forward tgt, tgt_mask) File "C:\Users\vision\satic\models\SAT.py", line 48, in decode return self.decoder(self.tgt_embed(tgt), memory, src_mask, tgt_mask) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\container.py", line 117, in forward input = module(input) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "C:\Users\vision\satic\models\SAT.py", line 228, in forward return self.lut(x) math.sqrt(self.d_model) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\functional.py", line 1814, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

YuanEZhou commented 3 years ago

Hi @maryawwm , my env is the linux os, it seems that you are using the windows version. Anyway, this RuntimeError is caused by the unexpected dtype of torch.embedding input, you can try to change this line to out = self.model(att_feats, seq.long(), att_masks, seq_mask).

maryawwm commented 3 years ago

Hi, Thanks for your respond.

Yes I'm using windows and your solution fix that error but after that I face 2 new errors:

C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\rnn.py:60: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) Traceback (most recent call last): File "train.py", line 337, in train(opt) File "train.py", line 225, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply output.reraise() File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0.


Original Traceback (most recent call last): File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker output = module(*input, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "C:\Users\vision\satic\misc\loss_wrapper.py", line 32, in forward teacher_output = self.teacher(fc_feats, att_feats, labels, att_masks) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply output.reraise() File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker output = module(*input, kwargs) File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _callimpl result = self.forward(*input, **kwargs) File "C:\Users\vision\satic\models\CaptionModel.py", line 33, in forward return getattr(self, ''+mode)(*args, kwargs) File "C:\Users\vision\satic\models\ShowTellModel.py", line 52, in _forward state = self.init_hidden(batch_size) File "C:\Users\vision\satic\models\ShowTellModel.py", line 43, in init_hidden weight = next(self.parameters()).data StopIteration

YuanEZhou commented 3 years ago

Hi @maryawwm , first you can check whether you are using pytorch 1.6. Second, if you have multiple GPUs on your machine, you can temporarily set CUDA_VISIBLE_DEVICES=0 and try again.

YuanEZhou commented 3 years ago

You can also refer to Multi-GPU Error