Closed zhibinQiu closed 2 years ago
Hey there. Not the author, but I have been running similar experience on a similar setting (3090 namely) but on based on mp3d. It gets around 25 FPS though. Also trained around 6 days and it reached around 13 millions.
The main difference I spotted is that in the audiogaol_depth_ddppo.yaml
for mp3d, the NUM_PROCESSES
is set to 10 instead of 5 as in the replica config.
NUM_PROCESSES defines the number of environment that are running in parallels for the agent to interact with. Since your NUM_PROCESS is half mine, I think the speed you are getting is quite consistent that of my experiment too.
Now, given how good your curves are looking anyway, namely Metrics/success, I would say you might not need to train that long anyway. Depending on your use case, you could just train for 2 million steps instead ...
I think the curves show that it is learning pretty well already, so you might not need to train for that long anyway.
@dosssman
Sorry for replying so late. An interesting phenomenon, when I set num_processes
to 9 (9 scenes for training In Replica), the training seems to converge more slowly, and the effect seems to be worse(The blue line).
Hey @zhibinQiu , no worries.
Do you mean 9 parallel environments ?
A wild guess is that less parallel environments could mean that the agent has less data to use for its update, which in turns leads to it learning slower than if num_envs = 10. Namely, with NUM_PROCESSES = 10, the PPO algorithms update the policy / value network using a rollout size of 10 * 150 = 1500, but with NUM_PROCESSES = 9, it has 150 less samples to work with.
Also, reducing the NUM_PROCESSES might not necessarily make it faster, because one of the time consuming process is the RNN forward loop that cannot be parallelized on the GPU.
@dosssman The num_processes set before is 5(The orange line), and now it is changed to 9, but it seems to be slower😂
Yeah, hard to know the reason with all those variables. It could also be that by having a lot of env simulator, the burden on the GPU makes it simulator slower ...
From the SS2.0 paper:
7.7 Training and Implementation Details of the Navigation Benchmark In this continuous audio-visual navigation benchmark, we use the AV-Nav agent [14] with the decentralized distributed proximal policy optimization (DD-PPO) [75]. We train the navigation policy for 80 million steps on the same AudioGoal navigation dataset [14] except that the movement and audio are c\ontinuous. For continuous navigation, we define success as the agent issuing a stop action within 1m of the goal location. We train the policy on 32 GPUs for 46 hours to converge.
Making it faster will probably require some hardware beefup more than anything else.
@dosssman You are so right, I run multiple programs on one graphics card😂. Thank you for your help all the time, I will put this question aside for a while, I want to know how to add another sound source using soundspaces2.0, if you have time, can you tell me the specific code location? I have spent a long time on this😭
I am not doing much haha :sweat_smile:
By "adding another sound source", do you mean having another sound source on top of the ringing telephone at the same time ?
Personally I am more interested in having different type of sound sources for my use case. In case you have not seen it already, the closest thing I could find to that goal was the Semantic Audio Visual Navigation (SAVi) https://vision.cs.utexas.edu/projects/semantic-audio-visual-navigation/ that has support for other sound sources. Unfortunately, I could not get it to run using SS2.0 continuous simulator (#85) so I have fallen back to SS1.0's dataset based variant for now.
I will let you know if I find anything related to adding and using other sound sources, anyway. Best of luck on your side too.
@zhibinQiu The acoustic propagation engine for SoundSpaces 2.0 runs on CPU. If you run too many jobs on the same node, the simulation will increase. You can play around with the tradeoff of simulation efficiency vs accuracy by tuning the parameters.
@zhibinQiu @dosssman sound rendering for multiple sources is additive. To compute the sound received from multiple sound sources, you basically need to compute the IR for each location, convovle them with each source sound and add the waveforms together. Either SS 1.0 or SS 2.0 supports this feature.
Hi @ChanganVR When I run:
python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth_ddppo.yaml \ --model-dir data/models/ss2/replica/dav_nav CONTINUOUS True
My program has been running for 6 days and I found it to be very slow, here is my training logand nvi-info.
and tensorboard
my environment is:
Ubuntu 20.04.3 LTS RTX 3090 python 3.9 pytorch 1.12.0 habitat-sim and habitat 0.2.2
What's more,My config is followed the defaultsound-spaces/ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth_ddppo.yaml
Is this normal? I would be very grateful if you could give me some help.