Some data size error of the depth sensor

gyx-gloria commented 1 year ago

When I run the run.py of the depth sensor, there is always an error like:

Traceback (most recent call last): File "ss_baselines/av_nav/run.py", line 101, in main() File "ss_baselines/av_nav/run.py", line 95, in main trainer.train() File "/home/gyx/sound-spaces/ss_baselines/av_nav/ppo/ppotrainer.py", line 267, in train rollouts.observations[sensor][0].copy(batch[sensor]) RuntimeError: The size of tensor a (5) must match the size of tensor b (128) at non-singleton dimension 1

And then I print the size of rollouts.observations[sensor][0] and batch[sensor] I found:

rollouts.observations[sensor][0].shape torch.Size([5, 128, 128, 1]) batch[sensor].shape torch.Size([5, 128, 128, 1, 1])

It seems that the size of these two data is diffrent, so how to fix this problem?

Thank you very much!

ChanganVR commented 1 year ago

@doudoudou1999 maybe I know which Habitat version are you using?

shizi991016 commented 1 year ago

Hi, i got the same problem, with habitat-sim v0.2.2 installed via source from commit mentioned in installation of sound-space with --bullet and --audio, and habitat v0.2.1 from Version bump v0.2.1(#669) commit.

When i run the example Training AudioGoal with depth sensor on Replica, i got the same error like:

2022-09-30 10:46:35,471 agent number of parameters: 4346693 Traceback (most recent call last): File "ss_baselines/av_nav/run.py", line 101, in main() File "ss_baselines/av_nav/run.py", line 95, in main trainer.train() File "/home/shizi9/Workspace/sound-spaces/ss_baselines/av_nav/ppo/ppotrainer.py", line 267, in train rollouts.observations[sensor][0].copy(batch[sensor]) RuntimeError: The size of tensor a (5) must match the size of tensor b (128) at non-singleton dimension 1 Exception ignored in: <function VectorEnv.del at 0x7f13a8366c20> Traceback (most recent call last): File "/home/shizi9/Workspace/habitat-lab-0.2.1/habitat/core/vector_env.py", line 592, in del self.close() File "/home/shizi9/Workspace/habitat-lab-0.2.1/habitat/core/vector_env.py", line 463, in close write_fn((CLOSE_COMMAND, None)) File "/home/shizi9/Workspace/habitat-lab-0.2.1/habitat/core/vector_env.py", line 118, in call self.write_fn(data) File "/home/shizi9/Workspace/habitat-lab-0.2.1/habitat/utils/pickle5_multiprocessing.py", line 63, in send self.send_bytes(buf.getvalue()) File "/home/shizi9/.conda/envs/habitat-0.2.1/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/shizi9/.conda/envs/habitat-0.2.1/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/home/shizi9/.conda/envs/habitat-0.2.1/lib/python3.7/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

like [doudoudou1999], i got the same size rollouts.observations[sensor][0].shape torch.Size([5, 128, 128, 1]) batch[sensor].shape torch.Size([5, 128, 128, 1, 1])

so i want to know how to deal with this problem

Thank you very much!

ichbill commented 1 year ago

Hi, I have the same issue. Did you solve it?

shizi991016 commented 1 year ago

Hi, I have the same issue. Did you solve it?

@ichbill Hi, I simply remove the last column of "batch[sensor]", but I have no idea which column is the extra one. In this link you may find more information about this problem. You can also follow the instruction written by dosssman in this link. I succeed with this instruction

ChanganVR commented 1 year ago

@ichbill @shizi991016 @doudoudou1999 this issue occurs due to rendering the visual observations with the incorrect habitat version. I just uploaded a step-by-step installation guide that you could follow to install the repo and run codes. Let me know if you have any questions!

shizi991016 commented 1 year ago

@ChanganVR Hi Changan, thanks for your guidance! I have been successful in training the model with continuous and discrete environments. By the way, I found that the problem of segment fault after several hours of continuous environment training may also be caused by wrong multi-GPU settings. I debugged with the gdb and got the signal SIGSEGV. I'm currently using one GPU for training and have trained 2M steps in the continuous environment. I will seek further about this question.

facebookresearch / sound-spaces

Some data size error of the depth sensor #93