facebookresearch / sound-spaces

A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.
https://soundspaces.org
Creative Commons Attribution 4.0 International
364 stars 58 forks source link

Training time is so long when I use ss2 for Replica. #86

Closed zhibinQiu closed 2 years ago

zhibinQiu commented 2 years ago

Hi @ChanganVR When I run: python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth_ddppo.yaml \ --model-dir data/models/ss2/replica/dav_nav CONTINUOUS True My program has been running for 6 days and I found it to be very slow, here is my training log

2022-08-02 05:34:25,092 update: 6770    env-time: 487357.394s   pth-time: 12964.147s    frames: 5078250.0
2022-08-02 05:34:25,092 Average window size: 50  distance_to_goal: 0.593  na: 71.315  normalized_distance_to_goal: 0.125  reward: 12.767  sna: 0.287  softspl: 0.658  spl: 0.724  success: 0.924
2022-08-02 05:45:39,022 update: 6780    fps: 10.142 
2022-08-02 05:45:39,023 update: 6780    env-time: 488018.482s   pth-time: 12976.478s    frames: 5085750.0
2022-08-02 05:45:39,023 Average window size: 50  distance_to_goal: 0.563  na: 69.996  normalized_distance_to_goal: 0.115  reward: 13.095  sna: 0.297  softspl: 0.672  spl: 0.741  success: 0.939
2022-08-02 05:58:52,105 update: 6790    fps: 10.141 
2022-08-02 05:58:52,106 update: 6790    env-time: 488797.508s   pth-time: 12989.980s    frames: 5093250.0
2022-08-02 05:58:52,106 Average window size: 50  distance_to_goal: 0.566  na: 69.698  normalized_distance_to_goal: 0.118  reward: 13.089  sna: 0.297  softspl: 0.670  spl: 0.742  success: 0.941
2022-08-02 06:10:10,181 update: 6800    fps: 10.142 
2022-08-02 06:10:10,181 update: 6800    env-time: 489463.316s   pth-time: 13001.763s    frames: 5100750.0
2022-08-02 06:10:10,181 Average window size: 50  distance_to_goal: 0.571  na: 70.406  normalized_distance_to_goal: 0.119  reward: 13.024  sna: 0.296  softspl: 0.666  spl: 0.742  success: 0.944
2022-08-02 06:21:23,062 update: 6810    fps: 10.143 
2022-08-02 06:21:23,062 update: 6810    env-time: 490123.468s   pth-time: 13013.938s    frames: 5108250.0
2022-08-02 06:21:23,062 Average window size: 50  distance_to_goal: 0.582  na: 70.244  normalized_distance_to_goal: 0.120  reward: 12.975  sna: 0.297  softspl: 0.663  spl: 0.737  success: 0.943
2022-08-02 06:32:45,673 update: 6820    fps: 10.145 
2022-08-02 06:32:45,673 update: 6820    env-time: 490793.171s   pth-time: 13026.332s    frames: 5115750.0
2022-08-02 06:32:45,673 Average window size: 50  distance_to_goal: 0.619  na: 70.653  normalized_distance_to_goal: 0.124  reward: 12.863  sna: 0.290  softspl: 0.662  spl: 0.732  success: 0.933
2022-08-02 06:44:08,866 update: 6830    fps: 10.146 
2022-08-02 06:44:08,867 update: 6830    env-time: 491464.132s   pth-time: 13038.076s    frames: 5123250.0
2022-08-02 06:44:08,867 Average window size: 50  distance_to_goal: 0.649  na: 70.031  normalized_distance_to_goal: 0.132  reward: 12.648  sna: 0.289  softspl: 0.658  spl: 0.724  success: 0.918
2022-08-02 06:57:50,008 update: 6840    fps: 10.144 
2022-08-02 06:57:50,009 update: 6840    env-time: 492271.018s   pth-time: 13051.758s    frames: 5130750.0
2022-08-02 06:57:50,009 Average window size: 50  distance_to_goal: 0.649  na: 67.889  normalized_distance_to_goal: 0.132  reward: 12.589  sna: 0.293  softspl: 0.657  spl: 0.721  success: 0.915
2022-08-02 07:09:37,705 update: 6850    fps: 10.145 
2022-08-02 07:09:37,705 update: 6850    env-time: 492965.598s   pth-time: 13064.353s    frames: 5138250.0
2022-08-02 07:09:37,705 Average window size: 50  distance_to_goal: 0.639  na: 68.134  normalized_distance_to_goal: 0.132  reward: 12.707  sna: 0.290  softspl: 0.663  spl: 0.733  success: 0.920
2022-08-02 07:21:06,454 update: 6860    fps: 10.146 
2022-08-02 07:21:06,454 update: 6860    env-time: 493641.380s   pth-time: 13076.771s    frames: 5145750.0
2022-08-02 07:21:06,454 Average window size: 50  distance_to_goal: 0.642  na: 68.503  normalized_distance_to_goal: 0.131  reward: 12.792  sna: 0.288  softspl: 0.663  spl: 0.735  success: 0.926
2022-08-02 07:32:39,977 update: 6870    fps: 10.147 
2022-08-02 07:32:39,977 update: 6870    env-time: 494321.768s   pth-time: 13089.401s    frames: 5153250.0
2022-08-02 07:32:39,977 Average window size: 50  distance_to_goal: 0.641  na: 68.172  normalized_distance_to_goal: 0.129  reward: 12.838  sna: 0.291  softspl: 0.655  spl: 0.729  success: 0.930
2022-08-02 07:44:20,468 update: 6880    fps: 10.147 
2022-08-02 07:44:20,469 update: 6880    env-time: 495008.889s   pth-time: 13102.271s    frames: 5160750.0
2022-08-02 07:44:20,469 Average window size: 50  distance_to_goal: 0.623  na: 67.284  normalized_distance_to_goal: 0.126  reward: 12.851  sna: 0.290  softspl: 0.654  spl: 0.727  success: 0.934
2022-08-02 07:58:27,888 update: 6890    fps: 10.145 
2022-08-02 07:58:27,889 update: 6890    env-time: 495841.752s   pth-time: 13116.255s    frames: 5168250.0
2022-08-02 07:58:27,889 Average window size: 50  distance_to_goal: 0.621  na: 69.587  normalized_distance_to_goal: 0.121  reward: 13.005  sna: 0.288  softspl: 0.658  spl: 0.729  success: 0.936
2022-08-02 08:09:46,011 update: 6900    fps: 10.146 
2022-08-02 08:09:46,011 update: 6900    env-time: 496506.922s   pth-time: 13128.697s    frames: 5175750.0
2022-08-02 08:09:46,011 Average window size: 50  distance_to_goal: 0.614  na: 68.037  normalized_distance_to_goal: 0.119  reward: 13.079  sna: 0.298  softspl: 0.659  spl: 0.726  success: 0.937
2022-08-02 08:21:00,265 update: 6910    fps: 10.148 
2022-08-02 08:21:00,265 update: 6910    env-time: 497168.011s   pth-time: 13141.280s    frames: 5183250.0
2022-08-02 08:21:00,265 Average window size: 50  distance_to_goal: 0.635  na: 67.116  normalized_distance_to_goal: 0.122  reward: 13.031  sna: 0.304  softspl: 0.661  spl: 0.731  success: 0.935
2022-08-02 08:32:17,861 update: 6920    fps: 10.149 
2022-08-02 08:32:17,861 update: 6920    env-time: 497832.692s   pth-time: 13153.691s    frames: 5190750.0
2022-08-02 08:32:17,861 Average window size: 50  distance_to_goal: 0.615  na: 66.651  normalized_distance_to_goal: 0.121  reward: 13.016  sna: 0.304  softspl: 0.675  spl: 0.740  success: 0.927
2022-08-02 08:44:14,803 update: 6930    fps: 10.149 
2022-08-02 08:44:14,804 update: 6930    env-time: 498536.060s   pth-time: 13166.724s    frames: 5198250.0
2022-08-02 08:44:14,804 Average window size: 50  distance_to_goal: 0.616  na: 66.540  normalized_distance_to_goal: 0.122  reward: 13.078  sna: 0.299  softspl: 0.676  spl: 0.746  success: 0.932
2022-08-02 08:58:24,704 update: 6940    fps: 10.147 
2022-08-02 08:58:24,704 update: 6940    env-time: 499371.326s   pth-time: 13180.771s    frames: 5205750.0
2022-08-02 08:58:24,704 Average window size: 50  distance_to_goal: 0.613  na: 65.310  normalized_distance_to_goal: 0.124  reward: 13.002  sna: 0.301  softspl: 0.679  spl: 0.754  success: 0.934
2022-08-02 09:09:48,442 update: 6950    fps: 10.148 
2022-08-02 09:09:48,443 update: 6950    env-time: 500042.252s   pth-time: 13193.046s    frames: 5213250.0
2022-08-02 09:09:48,443 Average window size: 50  distance_to_goal: 0.619  na: 65.568  normalized_distance_to_goal: 0.127  reward: 12.876  sna: 0.296  softspl: 0.680  spl: 0.755  success: 0.930
2022-08-02 09:21:08,736 update: 6960    fps: 10.149 
2022-08-02 09:21:08,736 update: 6960    env-time: 500709.954s   pth-time: 13205.087s    frames: 5220750.0
2022-08-02 09:21:08,736 Average window size: 50  distance_to_goal: 0.602  na: 67.733  normalized_distance_to_goal: 0.123  reward: 12.794  sna: 0.288  softspl: 0.679  spl: 0.747  success: 0.926
2022-08-02 09:32:34,455 update: 6970    fps: 10.150 
2022-08-02 09:32:34,456 update: 6970    env-time: 501383.023s   pth-time: 13217.220s    frames: 5228250.0
2022-08-02 09:32:34,456 Average window size: 50  distance_to_goal: 0.603  na: 70.158  normalized_distance_to_goal: 0.123  reward: 12.881  sna: 0.286  softspl: 0.672  spl: 0.748  success: 0.939

and nvi-info.

image

and tensorboard

image

my environment is: Ubuntu 20.04.3 LTS RTX 3090 python 3.9 pytorch 1.12.0 habitat-sim and habitat 0.2.2 What's more,My config is followed the default sound-spaces/ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth_ddppo.yaml Is this normal? I would be very grateful if you could give me some help.

dosssman commented 2 years ago

Hey there. Not the author, but I have been running similar experience on a similar setting (3090 namely) but on based on mp3d. It gets around 25 FPS though. Also trained around 6 days and it reached around 13 millions.

The main difference I spotted is that in the audiogaol_depth_ddppo.yaml for mp3d, the NUM_PROCESSES is set to 10 instead of 5 as in the replica config. NUM_PROCESSES defines the number of environment that are running in parallels for the agent to interact with. Since your NUM_PROCESS is half mine, I think the speed you are getting is quite consistent that of my experiment too.

Now, given how good your curves are looking anyway, namely Metrics/success, I would say you might not need to train that long anyway. Depending on your use case, you could just train for 2 million steps instead ...

I think the curves show that it is learning pretty well already, so you might not need to train for that long anyway.

zhibinQiu commented 2 years ago

@dosssman Sorry for replying so late. An interesting phenomenon, when I set num_processes to 9 (9 scenes for training In Replica), the training seems to converge more slowly, and the effect seems to be worse(The blue line).

image
dosssman commented 2 years ago

Hey @zhibinQiu , no worries.

Do you mean 9 parallel environments ?

A wild guess is that less parallel environments could mean that the agent has less data to use for its update, which in turns leads to it learning slower than if num_envs = 10. Namely, with NUM_PROCESSES = 10, the PPO algorithms update the policy / value network using a rollout size of 10 * 150 = 1500, but with NUM_PROCESSES = 9, it has 150 less samples to work with.

Also, reducing the NUM_PROCESSES might not necessarily make it faster, because one of the time consuming process is the RNN forward loop that cannot be parallelized on the GPU.

zhibinQiu commented 2 years ago

@dosssman The num_processes set before is 5(The orange line), and now it is changed to 9, but it seems to be slower😂

dosssman commented 2 years ago

Yeah, hard to know the reason with all those variables. It could also be that by having a lot of env simulator, the burden on the GPU makes it simulator slower ...

dosssman commented 2 years ago

From the SS2.0 paper:

7.7 Training and Implementation Details of the Navigation Benchmark In this continuous audio-visual navigation benchmark, we use the AV-Nav agent [14] with the decentralized distributed proximal policy optimization (DD-PPO) [75]. We train the navigation policy for 80 million steps on the same AudioGoal navigation dataset [14] except that the movement and audio are c\ontinuous. For continuous navigation, we define success as the agent issuing a stop action within 1m of the goal location. We train the policy on 32 GPUs for 46 hours to converge.

Making it faster will probably require some hardware beefup more than anything else.

zhibinQiu commented 2 years ago

@dosssman You are so right, I run multiple programs on one graphics card😂. Thank you for your help all the time, I will put this question aside for a while, I want to know how to add another sound source using soundspaces2.0, if you have time, can you tell me the specific code location? I have spent a long time on this😭

dosssman commented 2 years ago

I am not doing much haha :sweat_smile:

By "adding another sound source", do you mean having another sound source on top of the ringing telephone at the same time ?

Personally I am more interested in having different type of sound sources for my use case. In case you have not seen it already, the closest thing I could find to that goal was the Semantic Audio Visual Navigation (SAVi) https://vision.cs.utexas.edu/projects/semantic-audio-visual-navigation/ that has support for other sound sources. Unfortunately, I could not get it to run using SS2.0 continuous simulator (#85) so I have fallen back to SS1.0's dataset based variant for now.

I will let you know if I find anything related to adding and using other sound sources, anyway. Best of luck on your side too.

ChanganVR commented 2 years ago

@zhibinQiu The acoustic propagation engine for SoundSpaces 2.0 runs on CPU. If you run too many jobs on the same node, the simulation will increase. You can play around with the tradeoff of simulation efficiency vs accuracy by tuning the parameters.

@zhibinQiu @dosssman sound rendering for multiple sources is additive. To compute the sound received from multiple sound sources, you basically need to compute the IR for each location, convovle them with each source sound and add the waveforms together. Either SS 1.0 or SS 2.0 supports this feature.