Closed gyx-gloria closed 1 year ago
@doudoudou1999 maybe I know which Habitat version are you using?
Hi, i got the same problem, with habitat-sim v0.2.2 installed via source from commit mentioned in installation of sound-space with --bullet and --audio, and habitat v0.2.1 from Version bump v0.2.1(#669) commit.
When i run the example Training AudioGoal with depth sensor on Replica, i got the same error like:
2022-09-30 10:46:35,471 agent number of parameters: 4346693
Traceback (most recent call last):
File "ss_baselines/av_nav/run.py", line 101, in
like [doudoudou1999], i got the same size rollouts.observations[sensor][0].shape torch.Size([5, 128, 128, 1]) batch[sensor].shape torch.Size([5, 128, 128, 1, 1])
so i want to know how to deal with this problem
Thank you very much!
Hi, I have the same issue. Did you solve it?
Hi, I have the same issue. Did you solve it?
@ichbill Hi, I simply remove the last column of "batch[sensor]", but I have no idea which column is the extra one. In this link you may find more information about this problem. You can also follow the instruction written by dosssman in this link. I succeed with this instruction
@ichbill @shizi991016 @doudoudou1999 this issue occurs due to rendering the visual observations with the incorrect habitat version. I just uploaded a step-by-step installation guide that you could follow to install the repo and run codes. Let me know if you have any questions!
@ChanganVR Hi Changan, thanks for your guidance! I have been successful in training the model with continuous and discrete environments. By the way, I found that the problem of segment fault after several hours of continuous environment training may also be caused by wrong multi-GPU settings. I debugged with the gdb and got the signal SIGSEGV. I'm currently using one GPU for training and have trained 2M steps in the continuous environment. I will seek further about this question.
When I run the run.py of the depth sensor, there is always an error like:
Traceback (most recent call last): File "ss_baselines/av_nav/run.py", line 101, in
main()
File "ss_baselines/av_nav/run.py", line 95, in main
trainer.train()
File "/home/gyx/sound-spaces/ss_baselines/av_nav/ppo/ppotrainer.py", line 267, in train
rollouts.observations[sensor][0].copy(batch[sensor])
RuntimeError: The size of tensor a (5) must match the size of tensor b (128) at non-singleton dimension 1
And then I print the size of rollouts.observations[sensor][0] and batch[sensor] I found:
rollouts.observations[sensor][0].shape torch.Size([5, 128, 128, 1]) batch[sensor].shape torch.Size([5, 128, 128, 1, 1])
It seems that the size of these two data is diffrent, so how to fix this problem?
Thank you very much!