facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.45k stars 404 forks source link

gpu2gpu_transfer in CameraSensorSpec #1881

Open desaixie opened 1 year ago

desaixie commented 1 year ago

Habitat-Sim version

vx.x.x

Habitat is under active development, and we advise users to restrict themselves to stable releases. Are you using the latest release version of Habitat-Sim? Your question may already be addressed in the latest version. We may also not be able to help with problems in earlier versions because they sometimes lack the more verbose logging needed for debugging.

Main branch contains 'bleeding edge' code and should be used at your own risk.

Docs and Tutorials

Did you read the docs? https://aihabitat.org/docs/habitat-sim/

Did you check out the tutorials? https://aihabitat.org/tutorial/2020/

Perhaps your question is answered there. If not, carry on!

❓ Questions and Help

I noticed that there is a parameter gpu2gpu_transfer in CameraSensorSpec. If I set it to be True, then sim.step() would return the camera observation image as a PyTorch tensor on the GPU instead of a numpy ndarray. I am interested in learning more details here and confirming my understanding. Does this mean that the simulator renders the observation on the GPU, and it is directly converted to a PyTorch tensor, instead of it getting copied to CPU and converted to a numpy ndarray? Therefore, enabling this option could save me the time copying from GPU to CPU and from CPU to GPU? Is this menchmarked, i.e. how much performance boost would this option bring?

aclegg3 commented 1 year ago

Hey @desaixie,

Yes, Habitat supports keeping the observation data on the GPU to avoid expensive copies. This is a significant optimization for training time.

@erikwijmans @Skylion007 @mosra may have more details and/or an estimate of benchmark values.

Skylion007 commented 1 year ago

Does this mean that the simulator renders the observation on the GPU, and it is directly converted to a PyTorch tensor, instead of it getting copied to CPU and converted to a numpy ndarray?

Yes

Therefore, enabling this option could save me the time copying from GPU to CPU and from CPU to GPU?

Yes

Is this benchmarked, i.e. how much performance boost would this option bring?

Yes, you can run it in the benchmark.py script. Performance depends a lot on whether copying is the bottleneck, speed of RAM/VRAM, and the size of observations you are rendering. The reason it's not be default is because it allocates a cuda context in every subprocess that has a non-zero VRAM overhead (like 300mb-500mb) per process.