Closed CCYChongyanChen closed 4 years ago
Please raise mmf related concerns in the mmf repo -- also I suspect the reason of this error is due to different spatial sizes of the features that are batched together, which could be fixed by taking the maximum possible batch size.
Hi, I am trying to run the grid+MCAN via MMF. I extracted the grid features stored in .pth and each .pth has a size [2048, 26,19]. When I run the code, I mean a RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19] Could you help me with that? Thank you!
The full traceback is attached.
Traceback (most recent call last): File "/home/cc67459/MMF2/bin/mmf_run", line 33, in
sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 118, in run
nprocs=config.distributed.world_size,
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemo n, start_method='spawn')
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 158, in start_proce sses
while not context.join():
File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 2 terminated with the following error: Traceback (most recent call last): File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=p redict) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf_cli/run.py", line 56, in main trainer.train() File "/home/cc67459/MMF2/mmf_82/mmf/mmf/trainers/mmf trainer.py", line 108, in train self.training_loop() File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core /training_loop.py", line 36, in training_loop self.run_training_epoch() File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/trainers/core /training_loop.py", line 67, in run_training_epoch for batch in self.train_loader: File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/dataloader.py", line 881, in _process_da ta data.reraise() File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker p rocess 0. Original Traceback (most recent call last): File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/_utils/worker.py", line 178, in worker loop data = fetcher.fetch(index) File "/home/cc67459/MMF2/lib/python3.7/site-packages/t orch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/cc67459/MMF2/mmf_82/mmf/mmf/common/batch collator.py", line 24, in call sample_list = SampleList(batch) File "/home/cc67459/MMF2/mmf_8_2/mmf/mmf/common/sample .py", line 129, in init self[field][idx] = self._get_data_copy(sample[field] ) RuntimeError: The expanded size of the tensor (25) must match the existing size (26) at non-singleton dimension 1. Target sizes: [2048, 25, 19]. Tensor sizes: [2048, 26, 19]