RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.18k stars 1.24k forks source link

ARM64 support #13514

Open jwnimmer-tri opened 4 years ago

jwnimmer-tri commented 4 years ago

We should consider supporting ARM64 / AArch64 for a subset of Drake, and what timeline might make sense to offer such support.

See #10435 and #20075 for sample requests, though I've also received a couple of offline request pings as well.

Given that the core Drake developer team does not currently use this architecture, I'd expect such support to be facilitated by the core team (e.g., code review and Jenkins), but spearheaded via PRs from community contributors.

Please feel free to add additional thoughts below.


Edited to add:

For a while in the past, we advocated Running Drake on macOS ARM hardware via Rosetta 2 on macOS while ARM64 support matured. As of today, macOS is supported on ARM64 natively; see https://drake.mit.edu/installation.html for details.

The remaining focus of this issue is ARM64 for Ubuntu platforms.

One particularly helpful outcome would be to publish Linux ARM64 wheel files to PyPI.

jwnimmer-tri commented 4 years ago

Related to https://github.com/bazelbuild/bazel/issues/8833.

jamiesnape commented 4 years ago

FTR https://www.apple.com/newsroom/2020/06/apple-announces-mac-transition-to-apple-silicon/.

The Apple developer documentation also has various information on the differences between the arm64 and x86_64 architectures that are applicable to Linux too, e.g., float-to-int conversions.

joshuagruenstein commented 3 years ago

I think this would be especially useful for people running lots of cloud experiments (eg RL people), given AWS EC2 now offers low-cost ARM compute.

aiyer-commits commented 3 years ago

This would enable running drake on android as well I imagine? Would be interested in contributing, but not sure where to start

jamiesnape commented 3 years ago

This would enable running drake on android as well I imagine?

It is necessary, but not necessarly sufficient. Difficult to say what else would be needed at this stage.

Would be interested in contributing, but not sure where to start.

Looking at some of the prerequisites that Drake builds would be a start. I know FCL has issues: https://github.com/flexible-collision-library/fcl/issues/474

aiyer-commits commented 3 years ago

Cool, thanks for pointing me in the right direction!

Arjun

On Dec 21, 2020, at 21:11, Jamie Snape notifications@github.com wrote:

 This would enable running drake on android as well I imagine?

It is necessary, but not necessarly sufficient. Difficult to say what else would be needed at this stage.

Would be interested in contributing, but not sure where to start.

Looking at some of the prerequisites that Drake builds would be a start. I know FCL has issues: flexible-collision-library/fcl#474

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gizatt commented 3 years ago

As part of a side project, I just tried building Drake (arbitrarily, sha 9d5b2e from yesterday) on a Rasberry Pi 4b, which has a pretty dinky 1.5Mhz 4-core ARM64 CPU. I have the version with 4GB of RAM + added 8GB of swap. I'm running Ubuntu 20.04.2 on it.

Kudos to the build folks, it built with relatively superficial modifications! I just tested a 3D sim (simulating a simple point-foot quadruped with collision) with meshcat viz and some interactive sliders, so a lot of the moving parts seem intact and pleasantly performant (i.e. that sim was near-real-time) given that I'm running Drake on a potato. I'll poke at IK and optimization code when I put more hours on that project.

The specific diffs are here. The changes I needed to make were:

Notes:

jamiesnape commented 3 years ago

You can open an issue about drake-visualizer if the Dockerfile was not working out for you, whether it is ARM-specific or not. My hope is that each Dockerfile is agnostic to the build architecture and will work on pretty much anything 64-bit that Ubuntu supports.

jamiesnape commented 3 years ago
  • I deeded to disable -march broadwell to build the math folder.

Not great that we pass that unconditionally. Bazel does support passing flags specific to processor architecture. Probably worthy of an issue.

jwnimmer-tri commented 3 years ago

Thanks @gizatt for your experience report and posting the branch. Feedback like this is super helpful for us to understand user's experiences.

mwoehlke-kitware commented 2 years ago

So, I gave this a spin recently. Other than having to disable ibex (which required removing some test stuff; --define=NO_DREAL=ON was not sufficient) due to x86 assembly, and bludgeoning a whole lot of /usr/local paths that need to be /opt/homebrew, it mostly worked. There are 24 test failures, however. At least one of those I believe is a formatting issue and thus probably uninteresting, but it looked like a lot of numeric tests were returning slightly different results.

RussTedrake commented 2 years ago

FTR -- My now extremely small patch to enable bazel build //tools/install/libdrake:libdrake.so //bindings/pydrake and install on apple silicon is here.

jakewelde commented 2 years ago

Hi folks. I've been trying to build Drake in Ubuntu 20.04.03 on a Raspberry Pi Compute Module 4 with 4GB ram + 8GB swap, using @gizatt's diff above. The only differences I'm aware of between their setup and my own are that I'm in 20.04.03 (instead of 20.04.02), and I'm on a Compute Module 4 instead of 4b, but the underlying hardware is the same to my knowledge.

When I run bazel build //..., I get a whole bunch of "conflicting action" errors like the below:

ERROR: file 'systems/framework/_objs/diagram_context_test/diagram_context_test.pic.o' is generated by these conflicting actions:
Label: //systems/framework:diagram_context_test
RuleClass: cc_test rule
Configuration: 7da1b208fd0017bf8859b340625d8b4847fa88eed586ecfbb47bbaa2d0112fcd, 09f6f628bf7b1967fa0878bd8781bf7134141996d34ea5c493b3f294007260c9
Mnemonic: CppCompile
Action key: e0ee33cd4e44065dd0124f31e1b041bf698192be8daad9fc7db6b9657c38f3df
Progress message: Compiling systems/framework/test/diagram_context_test.cc
PrimaryInput: File:[/home/ubuntu/drake[source]]systems/framework/test/diagram_context_test.cc
PrimaryOutput: File:[[<execution_root>]bazel-out/aarch64-opt/bin]systems/framework/_objs/diagram_context_test/diagram_context_test.pic.o
Owner information: ConfiguredTargetKey{label=//systems/framework:diagram_context_test, config=BuildConfigurationValue.Key[7da1b208fd0017bf8859b340625d8b4847fa88eed586ecfbb47bbaa2d0112fcd]}, ConfiguredTargetKey{label=//systems/framework:diagram_context_test, config=BuildConfigurationValue.Key[09f6f628bf7b1967fa0878bd8781bf7134141996d34ea5c493b3f294007260c9]}
MandatoryInputs: are equal
Outputs: are equal

and when I run bazel build //tools/install/libdrake:libdrake.so, after a while I run into the following errors (is this second one possibly related to the x86 assembly issues @mwoehlke-kitware encountered?):

INFO: Analyzed target //tools/install/libdrake:libdrake.so (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/0e5a44dfdb0fa037a3af759ace4c4232/external/dreal/dreal/solver/BUILD.bazel:35:17: Compiling dreal/solver/brancher.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 83 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
/tmp/ccbci1Fk.s: Assembler messages:
/tmp/ccbci1Fk.s:2964: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x22]'
/tmp/ccbci1Fk.s:2965: Error: unknown mnemonic `subsd' -- `subsd v0,v1'
/tmp/ccbci1Fk.s:2966: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x21]'
Target //tools/install/libdrake:libdrake.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2527.997s, Critical Path: 809.85s
INFO: 409 processes: 12 internal, 397 linux-sandbox.
FAILED: Build did NOT complete successfully

Has anyone else encountered similar problems? I know that Drake isn't officially supported on arm64, but it sounds like other folks are having more success with this off-label use than I am. I also wanted to add my support for future official arm64 support if possible - in my lab we are hoping to use Drake on board small aerial vehicles, and that will be very difficult if limited to x86 arch.

Thanks in advance for any suggestions you might be able to offer, and sorry if this is too "off-label" to ask for support!

jwnimmer-tri commented 2 years ago

When I run bazel build //..., I get a whole bunch of "conflicting action" errors like the below:

I don't recognize those errors specifically, but I know that as of Bazel 5.0 there is https://github.com/bazelbuild/bazel/issues/14294 in play. Try adding --notrim_test_configuration to the bazel command line to see if it helps. Also be sure you have #16405 (from a couple of weeks ago) merged in your tree.

ERROR: Compiling dreal/solver/brancher.cc failed: ...

For now, you need to pass --define=NO_DREAL=ON on the bazel command line, on ARM64. Our dReal build is hard-coded to invoke x86_64 assembly at the moment.

jakewelde commented 2 years ago

Thanks very much @jwnimmer-tri for your prompt and helpful reply. I’m away from the hardware I was running this on today but I will try your suggestions as soon as I’m back in front of it. Appreciate your help!

jakewelde commented 2 years ago

Thanks @jwnimmer-tri for those tips - they seem to have mostly solved my problems! (I see the dReal one was actually above but I didn't realize I needed to specify it on the command line at first, sorry). I was able to complete the libdrake.so build without errors, but I'm still having trouble with the complete build - seems like the python bindings are still having some trouble and I'm not sure why.

ubuntu@cm4:~/drake$ bazel build //... --define=NO_DREAL=ON
INFO: Analyzed 8343 targets (1 packages loaded, 4 targets configured).
INFO: Found 8343 targets...
ERROR: /home/ubuntu/drake/bindings/pydrake/multibody/BUILD.bazel:81:21: Compiling bindings/pydrake/multibody/tree_py.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 259 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
INFO: Elapsed time: 26186.649s, Critical Path: 17983.61s
INFO: 1395 processes: 32 internal, 1363 linux-sandbox.
FAILED: Build did NOT complete successfully

If you have any further insights I would appreciate any suggestions. Thanks again for your help with this setup even though it's not officially supported! My main interest is the C++ library, the python bindings would just be a bonus.

jwnimmer-tri commented 2 years ago

gcc: fatal error: Killed signal terminated program cc1plus

This looks like an out of memory (OOM) error. Try running the build with reduced concurrency (e.g., --jobs=1), or else add more swap memory.

jakewelde commented 2 years ago

Thanks, I'll give that a shot! I think you're right, because my memory usage during the build is around 3.7GB out of 4GB and swap about 7GB out of 8GB. Appreciate the help!

cypressf commented 2 years ago

FTR -- My now extremely small patch to enable bazel build //tools/install/libdrake:libdrake.so //bindings/pydrake and install on apple silicon is here.

Hey @RussTedrake, I used your patch and built with that bazel build command on my M1. It seemed to go smoothly. Ideally, I would install the drake python bindings package into a python venv and be good to go, but I've having trouble finding the relevant package.

As a quick test I tried adding the bindings folder in the root of the drake project to my python path, but I got an import error when trying to import pydrake.all. Am I missing something obvious here?

PYTHONPATH=./bindings python3
>>> import pydrake.all
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cypressf/repos/drake/bindings/pydrake/all.py", line 31, in <module>
    from .autodiffutils import *
ModuleNotFoundError: No module named 'pydrake.autodiffutils'

I was considering using the suggested cmake python bindings build but I wanted to see if I could get it to work with bazel first.

RussTedrake commented 2 years ago

The cmake entry point is still the official entry point for running the install scripts that supports setting all of our documented options.

RussTedrake commented 2 years ago

Update: As of #17196, my apple_silicon2 branch patch is no longer needed. 🎉

jwnimmer-tri commented 1 year ago

As of the next release Drake v1.9.0 (due within the next few days), macOS arm64 will be fully supported for both source builds and stable binaries -- except for pypi wheels which are still in progress (#17906).

RussTedrake commented 1 year ago

Did we resolve the mac permissions issue that was discussed in Slack? https://drakedevelopers.slack.com/archives/C01CSPX85N3/p1664829545264099?thread_ts=1664236215.495399&cid=C01CSPX85N3 (Was using wget a solution)?

jwnimmer-tri commented 1 year ago

As of this morning, the https://drake.mit.edu/from_binary.html#stable-releases contains instructions for macOS users to run curl for downloading.

adeeb10abbas commented 1 year ago

Is the scope of this issue just supporting build from source or arm64 binaries as well?

jwnimmer-tri commented 1 year ago

It could be either one. It's a mostly feature request for people to speak up about their interests.

However, I don't think we'd close it with only build-from-source support enacted. If we don't have ARM64 Ubuntu official binaries published, we should keep it open for that request at minimum.

jakewelde commented 1 year ago

Sorry if I’m out of the loop on this, but what is the current feeling among the Drake development team about arm64 Ubuntu support in the near to medium term?

I’ve tried some of the off-label usage suggested above but haven’t been able to build successfully to completion. I’m just asking again because based on @jwnimmer-tri ‘s most recent reply above, I wondered if official support for Ubuntu on arm64 was closer than I thought (in either build from source or binary format). I’m in the process of making some hardware/software co-design decisions, and it would be great to know if arm64 hardware and official Drake support will soon cease to be mutually exclusive options.

Thanks for your thoughts!

jwnimmer-tri commented 1 year ago

I think @adeeb10abbas could give a more definitive answer on the current state of affairs, but as I understand it as of #18264 or later, installing from source using cmake && make will succeed on Ubuntu 22 on arm64, except that:

(1) You need to install Bazel by hand first; the setup/ubuntu/install_prereqs doesn't install it automatically. (2) You need to pass --define NO_DREAL=ON to Bazel somehow, maybe a user.bazelrc file in the Drake source tree.

Writing that out now, I realize we can probably fix (2) to happen automatically. I'll see what I can do.

If you have a build error, I think it's fine to post it in this thread and the people involved can try to help. (Be sure to include a complete recipe to reproduce the problem.)


As for the support roadmap, there's nothing set in stone. We're willing to accept PRs that make off-label usage work better (assuming they don't regress the on-label uses), as in #18261, #18264, #18221. Given that we have macOS arm64 under official support, it seems likely that Ubuntu arm64 will remain in pretty good shape.

The question of official support is really just about CI costs, since anything official requires regular CI coverage, nightly builds, etc. I haven't yet started to create a budget for those costs, but I can have the team start investigating that soon.

jakewelde commented 1 year ago

Thank you very much @jwnimmer-tri for the detailed and helpful response. I will give this another shot with these new tips and report back if I have issues. Really appreciate your candor and assistance!

adeeb10abbas commented 1 year ago

Hi everyone, so as Jeremy mentioned, you should be able to build from source with NO_DREAL=ON as of #18264. Apart from that, everything works as it would on an x86_64 machine.

1) For Bazel, I used bazelisk to install which worked great for me! 2) One more thing @jakewelde, I'd recommend compiling on an Apple Silicon machine if possible unless you know your ARM64 machine is more powerful than that because it's gonna take a loooong time to compile :) My Parallels VM with single core M1 pro with 8 gigs of RAM took 2 hours :D

I can also share the binaries with ya Jack if you really need to hack something soon :)

3) @jwnimmer-tri regarding the support roadmap (please correct me if I am wrong) - instead of provisioning an ARM64 machine, maybe we can look into cross-compiling on the instances we already have for x86 systems? I would imagine it shouldn't cost significantly more than what it already does?

jwnimmer-tri commented 1 year ago

(3) Our Ubuntu CI is pay-by-the-minute (cloud) not dedicated machines, so will almost certainly be cheaper to rent the EC2 machines with arm64 hardware versus cross-compiling. It's also not only a question of hardware cost, but also the staff cost to set up, babysit, and debug the CI over time. We might be able to minimize that by only supporting a narrower subset of CI builds at the start.

jakewelde commented 1 year ago

Thank you very much for the additional insights @adeeb10abbas! I don't currently have access to an arm64 mac, but I'm not in a huge rush either. I'll give this a shot compiling on my embedded hardware and see where it gets me. Thanks for the tips! Assuming this works out, I think it's very likely we'll go this route.

jwnimmer-tri commented 1 year ago

One quick update:

Writing that out now, I realize we can probably fix --define NO_DREAL=ON to happen automatically. I'll see what I can do.

In a couple weeks fiddling with dReal will no longer be necessary; we've deprecated it for removal on 2023-02-01 (#18156).

LordAcrobaticTurtle commented 1 year ago

Hi everyone

Just wanted clarification on the --define NO_DREAL=ON argument I'm attempting to build drake on a Nvidia Jetson TX2 in an Ubuntu 22.04 Docker container. I am using cmake and make to compile the project. The .bazelrc imports the user.bazelrc file which contains the following two lines.

build --define NO_DREAL=ON
build --jobs=1

However, I'm still getting these errors during compilation

[ 12%] Performing build step for 'drake_cxx_python'
INFO: Analyzed target //:install (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/dev/drake/solvers/BUILD.bazel:758:25: Compiling solvers/no_dreal.cc failed: (Exit 1): cc failed: error executing command (from target //solvers:_dreal_solver_compiled_cc_impl) /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 102 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
solvers/no_dreal.cc:10:28: error: 'DrealSolver' is deprecated: \nDRAKE DEPRECATED: dReal support is being withdrawn from Drake; for details, see https://github.com/RobotLocomotion/drake/pull/18156\nThe deprecated code will be removed from Drake on or after 2023-02-01. [-Werror=deprecated-declarations]
   10 | std::optional<DrealSolver::IntervalBox> DrealSolver::CheckSatisfiability(
      |                            ^~~~~~~~~~~
In file included from solvers/no_dreal.cc:2:
bazel-out/aarch64-opt/bin/solvers/_virtual_includes/_dreal_solver_compiled_cc_impl/drake/solvers/dreal_solver.h:25:1: note: declared here
   25 | DrealSolver final : public SolverBase {
      | ^~~~~~~~~~~
solvers/no_dreal.cc:17:28: error: 'DrealSolver' is deprecated: \nDRAKE DEPRECATED: dReal support is being withdrawn from Drake; for details, see https://github.com/RobotLocomotion/drake/pull/18156\nThe deprecated code will be removed from Drake on or after 2023-02-01. [-Werror=deprecated-declarations]
   17 | std::optional<DrealSolver::IntervalBox> DrealSolver::Minimize(
      |                            ^~~~~~~~~~~
In file included from solvers/no_dreal.cc:2:
bazel-out/aarch64-opt/bin/solvers/_virtual_includes/_dreal_solver_compiled_cc_impl/drake/solvers/dreal_solver.h:25:1: note: declared here
   25 | DrealSolver final : public SolverBase {
      | ^~~~~~~~~~~
cc1plus: some warnings being treated as errors
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2333.051s, Critical Path: 355.40s
INFO: 103 processes: 10 internal, 92 linux-sandbox, 1 worker.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:86: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:839: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

I'm pretty new to bazel so I assume that I'm passing the commands incorrectly. Does anyone have any advice?

jwnimmer-tri commented 1 year ago

Apologies, you've hit two new bugs.

(1) Drake's CMake build logic ended up using -Werror=deprecated-declarations outside of Drake CI. Ideally, we would not promote warnings to errors in your case. It would have warned, but ideally the build would have proceeded. That's logged as #18691 now.

(2) The "no dReal" config switch is not tested in CI when using GCC on arm64, so I missed some spurious deprecation warnings that it emits.

Easiest is probably if you could cherry-pick #18686 in your local build. It's scheduled to merge in ~48 hours, and will remove dReal entirely.

Alternatively, something along the lines of this patch should suppress the warnings:

--- a/solvers/BUILD.bazel
+++ b/solvers/BUILD.bazel
@@ -762,6 +762,7 @@ drake_cc_variant_library(
     srcs_enabled = ["dreal_solver.cc"],
     srcs_disabled = ["no_dreal.cc"],
     hdrs = ["dreal_solver.h"],
+    copts = ["-Wno-deprecated-declarations"],
     interface_deps = [
         ":solver_base",
         "//common:essential",
LordAcrobaticTurtle commented 1 year ago

Hi Jeremy

Thanks for the response. I think there is a bit more work to do :( Do you have any more suggestions? This compile error was thrown after applying that cherry pick commit

[ 12%] Performing build step for 'drake_cxx_python'
INFO: Analyzed target //:install (1 packages loaded, 627 targets configured).
INFO: Found 1 target...
INFO: From Compiling multibody/benchmarks/free_body/free_body.cc:
In file included from multibody/benchmarks/free_body/free_body.cc:1:
bazel-out/aarch64-opt/bin/multibody/benchmarks/free_body/_virtual_includes/free_body_only/drake/multibody/benchmarks/free_body/free_body.h: In member function 'std::pair<double, double> drake::multibody::benchmarks::free_body::FreeBody::CalcAngularRates_s_p() const':
bazel-out/aarch64-opt/bin/multibody/benchmarks/free_body/_virtual_includes/free_body_only/drake/multibody/benchmarks/free_body/free_body.h:182:58: note: parameter passing for argument of type 'std::pair<double, double>' when C++17 is enabled changed to match C++14 in GCC 10.1
  182 |   std::pair<double, double> CalcAngularRates_s_p() const {
      |                                                          ^
ERROR: /root/.cache/bazel/_bazel_/58387062a7d0fbb04042d57e5e514448/external/ibex/BUILD.bazel:200:11: Compiling external/ibex/filibsrc-3.0.2.2/fp_traits/fp_traits_sse_const.cpp failed: (Exit 1): cc failed: error executing command (from target @ibex//:filib) /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 30 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/tmp/ccblV1KD.s: Assembler messages:
/tmp/ccblV1KD.s:20: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
/tmp/ccblV1KD.s:34: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
/tmp/ccblV1KD.s:45: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
/tmp/ccblV1KD.s:56: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1880.896s, Critical Path: 747.88s
INFO: 394 processes: 9 internal, 385 linux-sandbox.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:86: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:839: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
adeeb10abbas commented 1 year ago

@LordAcrobaticTurtle Do you have a small repro that I could give a shot? On my Parallels Focal Jammy VM on M1 Pro (Ubuntu aarch64), I am able to build and run drake-external-examples/drake_cmake_external just fine. I can give Jammy a try later

EDIT - I am actually on a Jammy VM, not focal.

jwnimmer-tri commented 1 year ago

@LordAcrobaticTurtle that error message seems impossible after the cherry-pick, because the external/ibex/BUILD.bazel file it's complaining about was deleted in #18686.

LordAcrobaticTurtle commented 1 year ago

@adeeb10abbas I'm trying to build the drake library so it can be used in an external project, unfortunately, I don't have any code I can share. I have not modified the build at all.

@jwnimmer-tri Yes, my apologies, I made a mistake when cherry picking that commit. I'm fairly certain I cherry-picked correctly this time, however I am still receiving this error.

[ 12%] Performing build step for 'drake_cxx_python'
INFO: Build option --define has changed, discarding analysis cache.
INFO: Analyzed target //:install (0 packages loaded, 14561 targets configured).
INFO: Found 1 target...
ERROR: /home/dev/drake.og/bindings/pydrake/systems/BUILD.bazel:88:21: Compiling bindings/pydrake/systems/primitives_py.cc failed: (Exit 1): cc failed: error executing command (from target //bindings/pydrake/systems:primitives.cpython-310-aarch64-linux-gnu.so) /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 228 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
cc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 218.381s, Critical Path: 189.47s
INFO: 7 processes: 7 internal.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:86: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:839: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

Happy to provide clarification/more info if needed 🙂

jwnimmer-tri commented 1 year ago

@LordAcrobaticTurtle see the Out Of Memory tip.

jwnimmer-tri commented 10 months ago

Updates: The build system work at #17231 will be helpful here, and is nearing completion. In particular, the VTK change in #16502 will remove the use of precompiled custom binaries as part of a Drake install. The only dependencies necessary to build, install, and run Drake will be things already in Ubuntu.

alsocapprin commented 6 months ago

Latest in this saga, I'm trying to install Drake on a Jetson Orin. After following the guidance above, building almost succeeds, but I run into a linking error. I'm not tremendously experienced with this process, so any help would be appreciated. If this could be made easier with #17231 as commented above, I could potentially try a pre-release.

I'm eventually interested in getting the python bindings to build, but I figured I would try first with the base project.

Error Log ``` cbass@amp-jetson-01:~/Projects/drake$ bazel build //... --define NO_DREAL=ON INFO: Analyzed 11957 targets (0 packages loaded, 0 targets configured). INFO: Found 11957 targets... ERROR: /home/cbass/Projects/drake/common/BUILD.bazel:1079:20: Linking common/text_logging_no_spdlog_test failed: (Exit 1): gcc failed: error executing command (from target //common:text_logging_no_spdlog_test) /usr/bin/gcc @bazel-out/aarch64-opt/bin/common/text_logging_no_spdlog_test-2.params Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal::~ThreadLocal(): error: undefined reference to 'pthread_getspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal::~ThreadLocal(): error: undefined reference to 'pthread_key_delete' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal > >::~ThreadLocal(): error: undefined reference to 'pthread_getspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal > >::~ThreadLocal(): error: undefined reference to 'pthread_key_delete' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_getspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_key_delete' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_getspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_key_delete' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal > >::GetOrCreateValue() const: error: undefined reference to 'pthread_setspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::UnitTestImpl(testing::UnitTest*): error: undefined reference to 'pthread_key_create' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::UnitTestImpl(testing::UnitTest*): error: undefined reference to 'pthread_key_create' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::UnitTestImpl(testing::UnitTest*): error: undefined reference to 'pthread_key_create' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::GetTestPartResultReporterForCurrentThread(): error: undefined reference to 'pthread_setspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::UnitTest::AddTestPartResult(testing::TestPartResult::Type, char const*, int, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&): error: undefined reference to 'pthread_setspecific' bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::UnitTest::AddTestPartResult(testing::TestPartResult::Type, char const*, int, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&): error: undefined reference to 'pthread_setspecific' collect2: error: ld returned 1 exit status INFO: Elapsed time: 1722.646s, Critical Path: 303.34s INFO: 2421 processes: 878 internal, 1543 linux-sandbox. FAILED: Build did NOT complete successfully
alsocapprin commented 6 months ago

Update: I managed to get a build working by adding the -pthread linker option to failing components. At that point, running a bazel build //... succeeds.

When I run bazel test //..., all but ~300 tests succeed, which appear related to the python bindings and VTK. This comment above appears to have the tools to address at least the VTK issues, but I won't replicate them given that #17321 stands a chance of doing so for me. I've tested several of the included examples as well, which make me think that "normal" use of Drake works, as long as I'm not using things like VTK. (edit: Some of these failures are due to exec format errors; others have to do with the fact that I'm running tests on a headless machine and there is no display)

One final note: When I use the CMake workflow to build the Python bindings, the build succeeds, but install fails because strip can't read one of the .so files related to transformations - there may be more. I assume that this is an x86 binary that isn't being built locally, like VTK.

That's all I've done up until now. I'm happy to try suggestions to get all the tests passing or the python bindings installing correctly; otherwise, this is probably all I'll pull on this thread for now.

jwnimmer-tri commented 6 months ago

When building Drake >= v1.21.0, VTK is compiled from source (and so no longer uses the pre-compiled x86-specific binaries). The only downloaded object code that's not built from source are some opt-in commercial solvers (mosek, gurobi), a docs-only tool (doxygen), and a test-only tool (buildifier; see ~#20638~ #20579).

alsocapprin commented 6 months ago

Thanks for the response - I think then my VTK test failures are related to the lack of display, and the other test failures are related to buildifier.

jwnimmer-tri commented 4 months ago

When building Drake >= v1.21.0, VTK is compiled from source (and so no longer uses the pre-compiled x86-specific binaries). The only downloaded object code that's not built from source are some opt-in commercial solvers (mosek, gurobi), a docs-only tool (doxygen), and a test-only tool (buildifier).

As of #20579, Drake's @buildifier repository rule now supports arm64 natively, so this is no longer a problem.

I managed to get a build working by adding the -pthread linker option to failing components.

We'd welcome pull request(s) with fixes like that.

alsocapprin commented 4 months ago

@jwnimmer-tri Apologies for the delay, but opened a PR (above) with our additional linker options. I haven't had the time yet to run extensive testing (apart from our own day-to-day use and an initial set of bazel tests), but I'll do some more wringout in the next day or two.

jwnimmer-tri commented 3 months ago

Per #21030, we have some CI jobs defined now for arm64 here.

Unsurprisingly, they indicate not everything is working yet. In particular, install_bazel.sh is a no-op on arm64 per #18261. Solving that is the next step here.

jwnimmer-tri commented 3 months ago

The CI jobs are up and running, with Bazel installed properly.

To see the current status, check linux-arm-jammy-unprovisioned-gcc-bazel-experimental-release for the most recent build log.

As of today, there are around 40 failing tests. The next job for our arm64 user community is to open pull requests with fixes for those tests. It's fine (and preferable) to chip away a them one or a few at a time, not all at once. As long as a PR makes at least one test better, and no tests worse, we're still happy to merge the PR even with overall failing CI.