[QUESTION] Recommended system requirements for running Hydra?

MikeDmytrenko commented 1 year ago

What are the recommended best-case system requirements for running Hydra on map sizes comparable to uHumans datasets with VIO enabled (such as Kimera-VIO)? Information like CPU, storage access speed, RAM and GPU in case we want to run semantic segmentation live.

In addition, what difference would running the above on bare metal, container or virtual machine make?

Thanks I advance!

nathanhhughes commented 1 year ago

I think unfortunately there's no one answer: it really depends on what you require in terms of map resolution, accuracy and latency. We've shown that the frontend of Hydra can operate at up to 5hz on an Nvidia Xavier NX, but the 3D metric-semantic reconstruction isn't included in that (and is a little slower). We've also run Hydra live at ~2hz with a 2d semantic segmentation network on the Xavier (though with a mobilenet variant instead of the HRnet model we used in the Hydra paper). We haven't run with Kimera-VIO in the loop (we were using T265 odometry instead), but there should be enough headroom to run Kimera-VIO alongside everything else. This is to roughly say that I'd expect the minimum requirements would be something with similar CPU/GPU compute power as a Xavier, though you may have to tweak the voxel size a little to get something that remains performant. Storage access speed shouldn't matter (we don't write out much to disk until Hydra exits). Finally, the backend isn't necessarily going to run in real-time (optimizations after a loop closure will likely take a while).

More specifically:

A Xavier NX is about as small as I'd go, but other applications might want something more powerful (a typical laptop processor should be fine)
The frontend will run on an Nvidia Xavier NX at up to 5hz, and reconstruction will run at about 2hz
The backend won't output anything while optimizing after a loop-closure and isn't expected to run in real time
GPU size depends on what you need to run your chosen semantic segmentation network at 5hz
The visualization is pretty heavy-weight, and disabling it (or running it offboard) will help

It's also worth noting that we're actively working on improvements to Hydra (and additional profiling on the Xavier) that should hopefully be released soon.

In addition, what difference would running the above on bare metal, container or virtual machine make?

As far as I know, there's almost no difference between running on bare-metal linux versus running in a container. For what it's worth I ran the experiments for the paper in a lxc container and normally use them for development. A vm will come with some sort of performance penalty (though typically very slight), but it really depends on how the virtual machine is set up. I haven't bothered to do any detailed profiling of the difference any of these make as we're targeting bare-metal linux on all of our robots.

MikeDmytrenko commented 1 year ago

Great, thanks for your answer! That's a very useful info when we move back to edge devices :)

But let's say we want to run Hydra on synthetic data from the simulation at the original speed on the ground station with "unlimited resources". We'd want a relatively fast backend and good detail of the mesh. For example, in case I want well-detailed mesh, what I've noticed is that the rosbag often has to be played at lower rate to give Hydra enough time to do its thing. With higher playback speeds my CPU goes to 100% and I start skipping frames.

Also, we've noticed that depending on the map size, RAM usage is very high, sometimes reaching 100 GB. We just don't want to deal with hardware restrictions while we're experimenting with Hydra, so with this in mind what ideal hardware setup would you pick?

nathanhhughes commented 1 year ago

We'd want a relatively fast backend and good detail of the mesh

In the current public implementation, there are a couple things that are slowing down the backend (and Hydra in general):

We iterate in a pairwise manner over the objects and the places to merge overlapping nodes
We copy the full mesh from the frontend to the backend, which can take a while for larger meshes

I'd guess that at least for now, that's going to limit the backend to about ~2 hz or lower. Some of the changes I mentioned in my last comment address address these two issues.

For example, in case I want well-detailed mesh, what I've noticed is that the rosbag often has to be played at lower rate to give Hydra enough time to do its thing.

I've gone down to 5cm voxel size for the tsdf/mesh/GVD before, and I certainly wouldn't try to go past that with the current implementation. If I remember correctly, 5cm was still pretty painful to run and may not have been real-time. Part of the fundamental limit is the that GVD detection is single-threaded and scales poorly with the voxel size (in general, most of the reconstruction does). The GVD voxel size theoretically doesn't have to be set to the same as the TSDF voxel size which might help boost performance a little, but that would take a little bit of development.

Also, we've noticed that depending on the map size, RAM usage is very high, sometimes reaching 100 GB

I'd be surprised if Hydra itself was directly the culprit (at 10cm resolution for the mesh, the office scene should be about ~1.4GB of RAM usage). The typical breakdown for memory is:

windowed TSDF + mesh + GVD + semantic labels for the hydra topology node: ~100 mb (10 + 10 + 40 + 40)
3x copies of the mesh (compression, frontend and backend): ~1.2 GB
scene graphs and other book-keeping: ~100mb

It might be a little higher for the public version (like 10%) because of how we handle the mesh vertices and faces, but there shouldn't be significant differences.

That said, if the mesh resolutions is much smaller than 10cm (i.e. ~1cm) I could see significant amounts of memory being used. I still think it's more likely though that running at higher rates is causing input messages (either pointclouds or mesh updates) to be queued up because either voxblox or Hydra isn't processing the updates fast enough.

what ideal hardware setup would you pick?

I typically run on the workstation reported in the paper: a Ryzen threadripper 3960x (so 24 cores, 48 threads) and 64GB of RAM. I could see the utility of increasing the amount of RAM if you're planning on working with larger meshes than we normally do, and I could also see the utility of having multiple GPUs on the workstation if you're running with a simulator in the loop (I haven't felt constrained by my GPU though). It's worth noting that I typically limit my container RAM usage to ~48GB, and core usage to 16 threads. The number of threads available does help speed up the reconstruction (marching cubes and the tsdf integration both use as many threads are available), so that may help your use-case if you don't have a CPU with a lot of cores.

MIT-SPARK / Hydra

[QUESTION] Recommended system requirements for running Hydra? #24