RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.33k stars 1.26k forks source link

Reducing Build Weight with Drake as an External Dependency #21868

Open wrangelvid opened 2 months ago

wrangelvid commented 2 months ago

Drake comes with a strong ecosystem, but similar to the solvers there are parts of drake that could be lighter and optionally excluded in the build. Note that this is specific for drake as an external dependency. I hope we can strategize in this issue an approach and where we can drop baggage.

Here are a few heavy things that I think could be excluded. Feel free to suggest more!

Default Assets: While going through package_map, I found that find_resources has a few steps to look for assets, default paths, libmarker.so, etc. Maybe some steps could be excluded with a flag?

Meshcat: Meshcat is great, but there are various applications of drake that don't require visualization. From my understanding, this is pretty easy to exclude. We just don't reference meshcat and excluded it via add_default_workspace ✅

Renderers: I was stunned to find out that just building MultibodyPlant pulls in vtk_internals, then X11, OpenGL, GLX to parse VTK meshes and glTFs. Would it be possible to make VTK mesh parsing optional and switch to tinyglTF (which also supports glb) files? That way, we can avoid pulling in the whole rendering pipeline. Obviously, I have to look at more corners to see where VTK is used other than for rendering.

Spdlog This is one is interesting. There are flags that check for HAVE_SPDLOG. The only way to exclude it is to create an empty spdlog target which doesn't ship the HAVE_SPDLOG flag. Maybe there could be a more developer friendly and documented approach?

wrangelvid commented 2 months ago

cc @jwnimmer-tri

wrangelvid commented 2 months ago

Renderers: I was stunned to find out that just building MultibodyPlant pulls in vtk_internals, then X11, OpenGL, GLX to parse VTK meshes and glTFs. Would it be possible to make VTK mesh parsing optional and switch to tinyglTF (which also supports glb) files? That way, we can avoid pulling in the whole rendering pipeline. Obviously, I have to look at more corners to see where VTK is used other than for rendering.

The first step towards VTK independence for MultibodyPlant #21869 is out.

wrangelvid commented 2 months ago

The first step towards VTK independence for MultibodyPlant #21869 is out.

A low hanging fruit is to parse glb files. With tiny_gltf it's just a matter of calling a different function: loader.LoadBinaryFromFile(&model, &err, &warn, argv[1]);

But I see that .glb is tagged as unsupported. At what other places besides loading the mesh would we encounter that? Maybe @SeanCurtis-TRI knows?

jwnimmer-tri commented 2 months ago

Big picture, I believe the shape of this request is for drake/tools/BUILD.bazel to offer more unsupported flags (i.e., flags not covered by CI) which opt-out of default features, akin to these existing flags already offered there:

--define=NO_CLARABEL=ON
--define=NO_CLP=ON
--define=NO_CSDP=ON
--define=NO_IPOPT=ON
--define=NO_NLOPT=ON
--define=NO_OSQP=ON
--define=NO_SCS=ON

In principle that's fine, and we're open to pull requests that implement any flags of interest to downstream bazel users.

As with all not-covered-by-CI code, the requirement is that it doesn't "pollute" the codebase overly much, or sow overly much confusion or doubt for everyday Drake Developers to make progress. In other words, unsupported flags must not get in the way of the supported code and features.

For spdlog in particular, it would be an improvement to refactor the implementation from the current -DHAVE_SPDLOG hack that was added in our early days, to a more conventional --define to match everything else. That would remove the requirement for linking an empty / dummy @spdlog external.

Also note that the --define spelling is the bazel legacy way for option tuning. At some point we should port our flags use to the https://bazel.build/extending/config#user-defined-build-settings spelling instead, but for now I'm satisfied to use legacy spelling uniformly. If there is an upside to the new spelling, it would be OK to upgrade to it as part of this work, but I don't imagine that's required.

wrangelvid commented 2 months ago

I agree, a few more unsupported feature flags would be nice. We would have to choose wisely on what to exclude to minimize the developers’ overhead. The rendering dependencies by themselves reduced my build size by about 80% (on my partial build).

Do you think some minimal CI would be useful? Perhaps using bazel query to verify the dependencies were actually excluded could be useful. In that case, defining flags through the new way might be required since I couldn't get --define to work with bazel query (maybe I haven't tried hard enough).

wrangelvid commented 2 months ago

I’ve also noticed that we have quite a few source files with unused headers and build dependencies. What are your thoughts on incorporating tools like clang_tidy, include-what-you-use, or cppclean, etc.? For build dependencies, I know there’s some nice tooling for java + buildozer, but I couldn’t find anything equivalent for C++.

jwnimmer-tri commented 2 months ago

Do you think some minimal CI would be useful?

The thing we want to avoid is any increase the permutation count of CI jobs (https://drake-jenkins.csail.mit.edu/view/Production/). The permutations cover the basics like different operating systems and debug vs release, but in terms of this issue the relevant permutation option is whether the dependency footprint is "default" (which has no special mention in the CI job name) or else the "everything" CI variant which has all of the optional dependencies enabled (Gurobi, Mosek, OpenMP, OpenUSD, Snopt, etc.). We don't want to add for example a "minimal" CI variant, because of all of the extra CI jobs and babysitting it would need.

On the other hand, if it's possible to add new unit tests (ala the no-spdlog test flavor) to help lock in certain flags, those kinds of things are plausible. It just depends on the "not get in the way too much" metric upthread.

I’ve also noticed that we have quite a few source files with unused headers and build dependencies.

For IWYU in particular, my baseline is that the time investment required to configure it correctly to avoid too many false positives and false negatives on a project of this size is more costly than its benefit. For clang-tidy, my experience several years ago was that the latency of running it was too high to be enabled by default, but maybe its faster now. I don't know cppclean at all.

In general the idea is that we are open to new linters that make life better, but someone would need to sit down and do all of the prep work and configuration fine-tuning so that developers are not impacted by false results, and understand how to use the new tooling.

SeanCurtis-TRI commented 2 months ago

On the subject of VTK dependency. While removing VTK from various sites now holds certain appeal, it has been mooted in the past that we'd use VTK's broader file support to allow Drake to support more file formats (e.g., .stl, .dae, etc.) So, killing VTK from one particular compilation unit may be undone by something more fundamental in the future.

wrangelvid commented 2 months ago

For IWYU in particular, my baseline is that the time investment required to configure it correctly to avoid too many false positives and false negatives on a project of this size is more costly than its benefit. For clang-tidy, my experience several years ago was that the latency of running it was too high to be enabled by default, but maybe it's faster now. I don't know cppclean at all.

I agree that clang-tidy can be extremely slow if configured too broadly, which may explain why I have seen split configurations into fast and slow. Nevertheless, I do like that bazel_clang_tidy comes with caching enabled, which worked quite well on my machine. However, it's a bummer that misc-include-cleaner only handles includes in main files and thus didn't do anything. I also bumped into clangd, but didn't play much with it yet. My worry is that I found a few files in the MultibodyPlant depdency tree that unnecessarily pull dependencies and I can't possibly go through every src and build file in drake to clean it up.

On the subject of VTK dependency. While removing VTK from various sites now holds certain appeal, it has been mooted in the past that we'd use VTK's broader file support to allow Drake to support more file formats (e.g., .stl, .dae, etc.) So, killing VTK from one particular compilation unit may be undone by something more fundamental in the future.

From my understanding we only parse .vtk, .obj and .gtlf files to compute convex hulls and don't leverage VTK there fully. I'd be in favor for a lightweight dependency parsing more file types consistently across drake.

jwnimmer-tri commented 2 months ago

My worry is that I found a few files in the MultibodyPlant depdency tree that unnecessarily pull dependencies ...

This isn't a downside problem for Drake proper, though, which is why we haven't focused on trying to be more precise. Drake as a whole uses a wide baseline of dependencies, all privately vendored into libdrake.so, so culling redundant internal #include edges in that graph is not much of a win, since the code all goes into libdrake.so one way or another anyway.

From my understanding ...

As of today, yes. However, we will eventually (hopefully soon) add *.stl mesh parsing to Drake, and for that our plan is to use VTK to do work.

In general, our metrics for choosing dependencies is getting the job done easily and correctly via a well-maintained external that we can trust. Minimizing the build footprint is not particularly high on the list.

wrangelvid commented 2 months ago

This isn't a downside problem for Drake proper ...

I see your point, although with the drake-external-examples my impression was that there are supported workflows building from source and not referencing libdrake.so. I'll see if I can find a straight forward way to simplify the dependency-graph. If it's not that straight forward, we'll keep it as a tool to vendor-patch drake.

jwnimmer-tri commented 2 months ago

... impression was that there are supported workflows building from source ...

That's true, static linking a subset using Bazel is supported. I guess I would amend my statement then to say that I haven't seen examples of enough spurious dependencies edges to cause any meaningful pain for that use case, so we haven't prioritized trying to be formally precise with a linter.

The easiest solution might just be to fix spurious things that crop up and cause acute pain, but without any linting backstop. I don't imagine that spurious deps will grow back very fast.