Open desaixie opened 1 year ago
Hey @desaixie,
Yes, Habitat supports keeping the observation data on the GPU to avoid expensive copies. This is a significant optimization for training time.
@erikwijmans @Skylion007 @mosra may have more details and/or an estimate of benchmark values.
Does this mean that the simulator renders the observation on the GPU, and it is directly converted to a PyTorch tensor, instead of it getting copied to CPU and converted to a numpy ndarray?
Yes
Therefore, enabling this option could save me the time copying from GPU to CPU and from CPU to GPU?
Yes
Is this benchmarked, i.e. how much performance boost would this option bring?
Yes, you can run it in the benchmark.py script. Performance depends a lot on whether copying is the bottleneck, speed of RAM/VRAM, and the size of observations you are rendering. The reason it's not be default is because it allocates a cuda context in every subprocess that has a non-zero VRAM overhead (like 300mb-500mb) per process.
Habitat-Sim version
vx.x.x
Habitat is under active development, and we advise users to restrict themselves to stable releases. Are you using the latest release version of Habitat-Sim? Your question may already be addressed in the latest version. We may also not be able to help with problems in earlier versions because they sometimes lack the more verbose logging needed for debugging.
Main branch contains 'bleeding edge' code and should be used at your own risk.
Docs and Tutorials
Did you read the docs? https://aihabitat.org/docs/habitat-sim/
Did you check out the tutorials? https://aihabitat.org/tutorial/2020/
Perhaps your question is answered there. If not, carry on!
❓ Questions and Help
I noticed that there is a parameter
gpu2gpu_transfer
inCameraSensorSpec
. If I set it to beTrue
, thensim.step()
would return the camera observation image as aPyTorch tensor
on the GPU instead of anumpy ndarray
. I am interested in learning more details here and confirming my understanding. Does this mean that the simulator renders the observation on the GPU, and it is directly converted to aPyTorch tensor
, instead of it getting copied to CPU and converted to anumpy ndarray
? Therefore, enabling this option could save me the time copying from GPU to CPU and from CPU to GPU? Is this menchmarked, i.e. how much performance boost would this option bring?