facebookresearch / grid-feats-vqa

Grid features pre-training code for visual question answering
https://arxiv.org/abs/2001.03615
Apache License 2.0
268 stars 46 forks source link

Grid Feature Size? (2048, 26, 19) vs (2048,25,19) #13

Closed CCYChongyanChen closed 4 years ago

CCYChongyanChen commented 4 years ago

Hi, I am trying to run the grid+MCAN via MMF. I extracted the grid features stored in .pth and each .pth has a size [2048, 26,19]. When I run the code, I mean a RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19] Could you help me with that? Thank you!

The full traceback is attached.

Traceback (most recent call last): File "/home/cc67459/MMF2/bin/mmf_run", line 33, in sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')()) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 118, in run nprocs=config.distributed.world_size, File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemo n, start_method='spawn') File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 158, in start_proce sses while not context.join(): File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

-- Process 2 terminated with the following error: Traceback (most recent call last): File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=p redict) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 56, in main trainer.train() File "/home/cc67459/MMF2/mmf_82/mmf/mmf/trainers/mmf trainer.py", line 108, in train self.training_loop() File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core /training_loop.py", line 36, in training_loop self.run_training_epoch() File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core /training_loop.py", line 67, in run_training_epoch for batch in self.train_loader: File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 881, in _process_da ta data.reraise() File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker p rocess 0. Original Traceback (most recent call last): File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/_utils/worker.py", line 178, in worker loop data = fetcher.fetch(index) File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/cc67459/MMF2/mmf_82/mmf/mmf/common/batch collator.py", line 24, in call sample_list = SampleList(batch) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/common/sample .py", line 129, in init self[field][idx] = self._get_data_copy(sample[field] ) RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19]

endernewton commented 4 years ago

Please raise mmf related concerns in the mmf repo -- also I suspect the reason of this error is due to different spatial sizes of the features that are batched together, which could be fixed by taking the maximum possible batch size.