Closed npmhung closed 3 years ago
Hi @npmhung ,
It would be great to have a few more details to help debugging:
nvidia-smi
while running the experiment?max_sampler_processes_per_worker = 8
?python main.py projects/objectnav_baselines/experiments/robothor/objectnav_robothor_rgb_resnetgru_ddppo.py
on a two-GPU machine and obtained ~500 FPS:
[08/30 14:45:23 INFO:] train 30720 steps 0 offpolicy: total_loss 0.248 dist_to_target 3.79 ep_length 5.67 reward 0.105 spl 0.0121 success 0.0163 total_reward 0.105 lr 0.0003 ppo_loss/action -0.000483 ppo_loss/entropy -1.79 ppo_loss/ppo_total 0.248 ppo_loss/value 0.533 elapsed_time 31.8s approx_fps 483 [runner.py: 818]
and this 500 number should increase during training (it starts slower as the cache has to be populated).
I tried your command except that I have to restrict the "num_train_processes" to 30 because of the OOM error.
Machine specs: Ubuntu 18.04 Core(s) per socket: 4 Thread(s) per core: 2 CPU(s): 8 Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz Ram: 64 GBs
FPS: ~90 FPS.
Furthermore, in the function "get_open_x_displays" from ithor_util.py, I need to set open_display_strs = ['1']
because I couldn't start the x server on GPU 0.
Hi @npmhung,
I think you must be limited by the number of CPU cores in your setup. Can you try limiting the max_sampler_processes_per_worker
?
Yes I'd agree with @jordis-ai2 here, we generally get FPS gains by increasing parallelism (which is CPU intensive and thus relies on there being a larger number of cores). As suggested by Jordi, it might be worth playing around with different values of max_sampler_processes_per_worker
to see if you can get an FPS bump. Otherwise you'll likely have to play around with changing the model or how AI2-THOR is simulated, e.g.
CAMERA_WIDTH = 400
CAMERA_HEIGHT = 300
SCREEN_SIZE = 224
in projects/objectnav_baselines/experiments/objectnav_base.py
. You might get a boost by changing this to
CAMERA_WIDTH = 300
CAMERA_HEIGHT = 225
SCREEN_SIZE = 224
which would cause THOR to render at a lower resolution while preserving the aspect ratio. You could try rendering at an even lower resolution (although you'd likely have to change the CNN used by the objectnav model to compensate for this), i.e. you could do something like
CAMERA_WIDTH = 152
CAMERA_HEIGHT = 114
SCREEN_SIZE = 112
@jordis-ai2 Thank you! I will try that.
@Lucaweihs Thank you for your suggestion. I have other questions:
1) Can my model trained with different Camera width/height be tested on the test-set of ObjectNavigation challenge normally?
2) Is this fair/unfair to compare my model trained on this configuration with other models in the leaderboard?
Hi @npmhung,
- Can my model trained with different Camera width/height be tested on the test-set of ObjectNavigation challenge normally?
Yes but only if (1) you preserve the 4:3 aspect ratio and (2) you render images at only lower resolutions. If you change the aspect ratio or use higher-resolution images then your model has more information than is available to other models and thus this would be considered a rule violation.
Note that actually the above is a bit of a lie, you can train at whatever resolution / aspect ratio you want so long as you test your model using the standard aspect ratio and equal-to-less resolution images.
- Is this fair/unfair to compare my model trained on this configuration with other models in the leaderboard?
It would be potentially unfair to your model (as it's seeing lower resolution images) but should be fair to other models on the leaderboard. It's not unheard of for people to use relatively low resolution images when training navigation models for precisely the motivation you have (higher fps + lower memory requirements).
@Lucaweihs Please correct if I'm wrong. As you said above, the images are resize to 224x224 (screen_size). Therefore, I think that the parameter Screen_size (which is also input size ?), not CAMERA_H/W, will decide how much your model can see. Is that correct?
Besides, how many cpu cores were you using to have 500 FPS?
Hi @npmhung,
We render the images at 400x300 from AI2-THOR and then, for our models, post-process the images to be of size 224x224. Also I should clarify, this resizing isn't a crop, the images are squeezed so that they fit in the 224x224 shape. Others are free to use the full 400x300 image if they'd like.
Besides, how many cpu cores were you using to have 500 FPS?
The machine I was using has 36 cores.
As mentioned in the title, I have low FPS when training a model for object navigation task given that I have 2 GTX 1080Ti.
I'm setting max_sampler_processes_per_worker = 8.
Are there any other ways to increase the FPS?