er-muyue / BeMapNet

Other
188 stars 20 forks source link

cannot reshape array of size 1 into shape (8) #17

Open enescingoz opened 10 months ago

enescingoz commented 10 months ago

Hello,

Im trying to run training process with the command mentioned in readme. bash run.sh train bemapnet_nuscenes_swint 30

I completed all of the setup steps and my project directory same as mentioned in readme.

But i have this error

assets/weights/upernet_swin_tiny_patch4_window7_512x512.pth Backbone missing_keys: [] Backbone unexpected_keys: ['norm0.weight', 'norm0.bias'] Traceback (most recent call last): File "configs/bemapnet_nuscenes_swint.py", line 314, in BeMapNetCli(Exp).run() File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/core.py", line 172, in run self.dispatch(self.executor) File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/core.py", line 169, in dispatch executor_func() File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/core.py", line 153, in executor self.get_trainer().train() File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/core.py", line 143, in get_trainer trainer = Trainer(exp=exp, callbacks=callbacks, logger=logger, evaluator=evaluator) File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/executor.py", line 104, in init super(Trainer, self).init(exp, callbacks, logger) File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/executor.py", line 62, in init self._invoke_callback("after_init") File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/executor.py", line 93, in _invoke_callback func(self, *args, **kwargs) File "/home/adastec/catkin_bemapnet/BeMapNet/bemapnet/engine/environ.py", line 91, in after_init ranks = np.arange(self.world_size()).reshape(-1, self.sync_bn) ValueError: cannot reshape array of size 1 into shape (8) [Epoch]: 0%| | 0/30 [00:00<?, ?it/s

Python version: Python 3.8.10 CUDA Version: 11.8 Torch version: 2.0.1+cu117 (also tried with 1.10.1+cu111)

doohyun-cho commented 10 months ago

I think your #gpus is not matching with the --sync_bn parameter in 'run.sh'. If running with 4 GPUs, change --sync_bn into 4, not 8. I didn't find the reason inside the code yet.

enescingoz commented 10 months ago

I think your #gpus is not matching with the --sync_bn parameter in 'run.sh'. If running with 4 GPUs, change --sync_bn into 4, not 8. I didn't find the reason inside the code yet.

I think the problem is related to sync_bn parameter in run.sh.

I changed this parameter to 4 but gives same error(cannot reshape array of size 1 into shape (8)). When i try to use this parameter as 1, i have cuda out of memory error.

RuntimeError: CUDA out of memory. Tried to allocate 292.00 MiB (GPU 0; 7.80 GiB total capacity; 4.92 GiB already allocated; 187.94 MiB free; 5.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

image

liangshanOral commented 9 months ago

I think your #gpus is not matching with the --sync_bn parameter in 'run.sh'. If running with 4 GPUs, change --sync_bn into 4, not 8. I didn't find the reason inside the code yet.

I want to know why it's like that, could you tell me? thank you! @doohyun-cho