dartsim / dart

DART: Dynamic Animation and Robotics Toolkit
http://dartsim.github.io/
BSD 2-Clause "Simplified" License
893 stars 286 forks source link

Fails to build with GCC-10 on riscv64 #1482

Open cpaelzer opened 4 years ago

cpaelzer commented 4 years ago

Environment

Expected Behavior

Build and tests work fine

Current Behavior

On s390x the build time tests fail

     29 - test_ContactConstraint (SEGFAULT)

Here the initial build log

I have rebuilt the same with verbosity as recommended - the build is in this PPA and you can access the logs for all riscv64 from there.

Steps to Reproduce

  1. Get dart sources and install build dependencies
  2. get GCC-10
  3. build on RiscV64
  4. run tests
jslee02 commented 4 years ago

Thank you for the report! I'm not familiar with the architectures (riscv64 and s390x). Could you please let me know where these architectures are generally used for? Also, I'm afraid that I'm not sure how to reproduce and debug the issues without having those architectures. Is there a way to emulate them?

cpaelzer commented 4 years ago

Hi, they are on opposite end of the architecture spectrum.

s390x is hard to emulate right, but you can get access to a free virtual guest at https://developer.ibm.com/components/ibm-linuxone/gettingstarted/

riscv64 can be emulated via qemu like:

qemu-system-riscv64 -machine virt -m 2048 -smp 4 \
  -bios qemu-virt-20200504-ubuntu-firmware.bin \
  -device virtio-blk-device,drive=vda \
  -drive file=grovy-updated-20200811.qcow2,id=vda \
  -device virtio-net-device,netdev=eth0 \
  -netdev user,id=eth0,hostfwd=tcp::5555-:22

You can get instructions and the base images (for Ubuntu 20.04) here https://people.ubuntu.com/~wgrant/riscv64/

cpaelzer commented 4 years ago

I was slowly building this on a local riscv64 qemu emulation, I got to the same crash. For the sake of being generally usable I was testing another test:

root@ubuntu:~/dart-6.9.2# ./obj-riscv64-linux-gnu/unittests/unit/test_CollisionGroups
Running main() from /root/dart-6.9.2/unittests/gtest/src/gtest_main.cc
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from CollisionEngine/CollisionGroupsTest
[ RUN      ] CollisionEngine/CollisionGroupsTest.SkeletonSubscription/0
Running CollisionGroups test for [dart]
[       OK ] CollisionEngine/CollisionGroupsTest.SkeletonSubscription/0 (78 ms)
...
[       OK ] CollisionEngine/CollisionGroupsTest.BodyNodeSubscription/3 (0 ms)
[----------] 8 tests from CollisionEngine/CollisionGroupsTest (194 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (198 ms total)
[  PASSED  ] 8 tests.

So assuming that I can generally use this for debugging I was running the failing test and it crashed as expected.

root@ubuntu:~/dart-6.9.2# ./obj-riscv64-linux-gnu/unittests/unit/test_ContactConstraint 
Running main() from /root/dart-6.9.2/unittests/gtest/src/gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ContactConstraint
[ RUN      ] ContactConstraint.ContactWithKinematicJoint
Segmentation fault

Throwing the same into GDB is unfortunately far from helpful:

Starting program: /root/dart-6.9.2/obj-riscv64-linux-gnu/unittests/unit/test_ContactConstraint 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Running main() from /root/dart-6.9.2/unittests/gtest/src/gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ContactConstraint
[ RUN      ] ContactConstraint.ContactWithKinematicJoint

Program received signal SIGSEGV, Segmentation fault.
0x0000000000001000 in ?? ()
(gdb) bt
#0  0x0000000000001000 in ?? ()
(gdb) 

I was starting at testContactWithKinematicJoint and stepping my way towards the problem. It is a bit hard to track that way around, but once crashed the backtrace is empty.

The last hit of any of "getLinearVelocity,getSpatialVelocity,getTransform" is const Eigen::Vector6d& Frame::getSpatialVelocity() const.

Then next I went deeper from there of the list "getSpatialVelocity,getRelativeTransform,AdInvT,getRelativeSpatialVelocity,getParentFrame" the last hit is dart::math::AdInvT (_T=..., _V=...) at ./dart/math/Geometry.cpp:714

I checked dart::math::AdInvT in a multiple runs and it was hit 5582 times before the crash every time. That allows to step onto the failing execution of AdInvT.

Stack at this point looks like

(gdb) bt
#0  0x0000003ff7d6ffe8 in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:619
#1  dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:136
#2  0x0000003ff7d71996 in dart::dynamics::Frame::getSpatialVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>, 
    _inCoordinatesOf=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>) at ./dart/dynamics/Frame.cpp:168
#3  0x0000003ff7d71aca in dart::dynamics::Frame::getLinearVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>, 
    _inCoordinatesOf=<optimized out>) at ./dart/dynamics/Frame.cpp:222
#4  0x0000002aaaacb390 in testContactWithKinematicJoint (lcpSolver=..., tol=9.9999999999999995e-07) at ./unittests/unit/test_ContactConstraint.cpp:88
#5  0x0000002aaaacba42 in ContactConstraint_ContactWithKinematicJoint_Test::TestBody (this=<optimized out>) at /usr/include/c++/10/new:175
#6  0x0000002aaaaeedea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x2aaaaf29c8 "the test body", method=<optimized out>, object=0x2aaab23ce0)
    at ./unittests/gtest/src/gtest-internal-inl.h:807
#7  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x2aaab23ce0, method=<optimized out>, 
    location=location@entry=0x2aaaaf29c8 "the test body") at ./unittests/gtest/src/gtest.cc:2479
#8  0x0000002aaaae7502 in testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2517
#9  testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2508
#10 0x0000002aaaae75f0 in testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2693
#11 testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2667
#12 0x0000002aaaae7680 in testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2810
#13 testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2796
#14 0x0000002aaaae7a30 in testing::internal::UnitTestImpl::RunAllTests (this=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:592
#15 0x0000002aaaaef21a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    location=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:807
#16 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2aaab22b80, method=<optimized out>, 
    location=location@entry=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)") at ./unittests/gtest/src/gtest.cc:2479
#17 0x0000002aaaae7b46 in testing::UnitTest::Run (this=0x2aaab0a0e0 <testing::UnitTest::GetInstance()::instance>) at ./unittests/gtest/include/gtest/gtest.h:1340
#18 0x0000002aaaac5160 in RUN_ALL_TESTS () at ./unittests/gtest/include/gtest/gtest.h:2341
#19 main (argc=<optimized out>, argv=0x3ffffff458) at ./unittests/gtest/src/gtest_main.cc:36

From there it returns to: 0x0000003ff7d6ffd4 in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:619

From there I see it process some instructions from const Eigen::Isometry3d& WorldFrame::getRelativeTransform() const

(gdb) si 0x0000003ff7d6ffd6 619 return mRelativeTf; ...

As you can see compiler optimizations make this not perfectly linear.

I next see some commands from const Eigen::Vector6d& Frame::getSpatialVelocity() const again. But then out of a sudden I see _RandomAccessIterator, pop up:

(gdb) si
0x0000003ff7d70058  150   return mVelocity;
(gdb) 
0x0000003ff7d7005a  150   return mVelocity;
(gdb) 
0x0000003ff7d3ec9e in virtual thunk to dart::dynamics::BodyNode::getRelativeSpatialVelocity() const () at /usr/include/c++/10/bits/stl_heap.h:421
421       while (__last - __first > 1)
(gdb) frame
#0  0x0000003ff7d3ec9e in virtual thunk to dart::dynamics::BodyNode::getRelativeSpatialVelocity() const () at /usr/include/c++/10/bits/stl_heap.h:421
421       while (__last - __first > 1)

That still passes fine and I see code from 0x0000003ff7cf80c8 in dart::dynamics::Joint::getRelativeSpatialVelocity() const@plt () from /root/dart-6.9.2/obj-riscv64-linux-gnu/lib/libdart.so.6

At this point I am at:

(gdb) bt
#0  0x0000003ff7dac22a in dart::dynamics::Joint::getRelativeSpatialVelocity (this=0x2aaab2cea0) at ./dart/dynamics/Joint.cpp:355
#1  0x0000003ff7d7005c in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:150
#2  dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:136
#3  0x0000003ff7d71996 in dart::dynamics::Frame::getSpatialVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>, 
    _inCoordinatesOf=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>) at ./dart/dynamics/Frame.cpp:168
#4  0x0000003ff7d71aca in dart::dynamics::Frame::getLinearVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>, 
    _inCoordinatesOf=<optimized out>) at ./dart/dynamics/Frame.cpp:222
#5  0x0000002aaaacb390 in testContactWithKinematicJoint (lcpSolver=..., tol=9.9999999999999995e-07) at ./unittests/unit/test_ContactConstraint.cpp:88
#6  0x0000002aaaacba42 in ContactConstraint_ContactWithKinematicJoint_Test::TestBody (this=<optimized out>) at /usr/include/c++/10/new:175
#7  0x0000002aaaaeedea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x2aaaaf29c8 "the test body", method=<optimized out>, object=0x2aaab23ce0)
    at ./unittests/gtest/src/gtest-internal-inl.h:807
#8  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x2aaab23ce0, method=<optimized out>, 
    location=location@entry=0x2aaaaf29c8 "the test body") at ./unittests/gtest/src/gtest.cc:2479
#9  0x0000002aaaae7502 in testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2517
#10 testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2508
#11 0x0000002aaaae75f0 in testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2693
#12 testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2667
#13 0x0000002aaaae7680 in testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2810
#14 testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2796
#15 0x0000002aaaae7a30 in testing::internal::UnitTestImpl::RunAllTests (this=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:592
#16 0x0000002aaaaef21a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    location=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:807
#17 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2aaab22b80, method=<optimized out>, 
    location=location@entry=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)") at ./unittests/gtest/src/gtest.cc:2479
#18 0x0000002aaaae7b46 in testing::UnitTest::Run (this=0x2aaab0a0e0 <testing::UnitTest::GetInstance()::instance>) at ./unittests/gtest/include/gtest/gtest.h:1340
#19 0x0000002aaaac5160 in RUN_ALL_TESTS () at ./unittests/gtest/include/gtest/gtest.h:2341
#20 main (argc=<optimized out>, argv=0x3ffffff458) at ./unittests/gtest/src/gtest_main.cc:36

Returning further from here ...

A call to

(gdb) bt
#0  0x0000003ff7d819fa in non-virtual thunk to dart::dynamics::GenericJoint<dart::math::SE3Space>::updateRelativeSpatialVelocity() const () at ./dart/dynamics/Joint.cpp:668
#1  0x0000003ff7dac23c in dart::dynamics::Joint::getRelativeSpatialVelocity (this=0x2aaab2cea0) at ./dart/dynamics/Joint.cpp:357
#2  0x0000003ff7d7005c in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:150

Further through dart::dynamics::GenericJoint<dart::math::SE3Space>::getRelativeJacobianStatic (this=0x2aaab2ca20) at ./dart/dynamics/detail/GenericJoint.hpp:1588

Then I seem to be back up at the test within testContactWithKinematicJoint

(gdb) bt
#0  testContactWithKinematicJoint (lcpSolver=..., tol=<optimized out>) at ./unittests/unit/test_ContactConstraint.cpp:48
#1  0x0000002aaaacba42 in ContactConstraint_ContactWithKinematicJoint_Test::TestBody (this=<optimized out>) at /usr/include/c++/10/new:175
#2  0x0000002aaaaeedea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x2aaaaf29c8 "the test body", method=<optimized out>, object=0x2aaab23ce0)
    at ./unittests/gtest/src/gtest-internal-inl.h:807
#3  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x2aaab23ce0, method=<optimized out>, 
    location=location@entry=0x2aaaaf29c8 "the test body") at ./unittests/gtest/src/gtest.cc:2479
#4  0x0000002aaaae7502 in testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2517
#5  testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2508
#6  0x0000002aaaae75f0 in testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2693
#7  testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2667
#8  0x0000002aaaae7680 in testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2810
#9  testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2796
#10 0x0000002aaaae7a30 in testing::internal::UnitTestImpl::RunAllTests (this=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:592
#11 0x0000002aaaaef21a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    location=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:807
#12 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2aaab22b80, method=<optimized out>, 
    location=location@entry=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)") at ./unittests/gtest/src/gtest.cc:2479
#13 0x0000002aaaae7b46 in testing::UnitTest::Run (this=0x2aaab0a0e0 <testing::UnitTest::GetInstance()::instance>) at ./unittests/gtest/include/gtest/gtest.h:1340
#14 0x0000002aaaac5160 in RUN_ALL_TESTS () at ./unittests/gtest/include/gtest/gtest.h:2341
#15 main (argc=<optimized out>, argv=0x3ffffff458) at ./unittests/gtest/src/gtest_main.cc:36

Something suspicious here:

(gdb) n
77    for (auto i = 0u; i < 100; ++i)
(gdb) 
Cannot access memory at address 0x1000

Remember that value of 0x1000 is where my backtrace is eventually dead and stuck. From here I can't go on, the instruction pointer is stuck - gdb can't continue anymore.

The last sign of life before this was at:

testContactWithKinematicJoint (lcpSolver=..., tol=9.9999999999999995e-07) at ./unittests/gtest/include/gtest/gtest.h:317
317   operator bool() const { return success_; }  // NOLINT

And a gdb "next" command got me where I'm dead. The return from there must have killed it, but I don't see how.

(gdb) info registers 
ra             0x2aaaacba42 0x2aaaacba42 <ContactConstraint_ContactWithKinematicJoint_Test::TestBody()+120>
sp             0x3fffffef30 0x3fffffef30
gp             0x2aaab09ac0 0x2aaab09ac0
tp             0x3ff44d2720 0x3ff44d2720
t0             0x2aaab3be98 183252532888
t1             0x1000   4096
t2             0xfffffffffffffe68   -408
fp             0x2aaab23ce0 0x2aaab23ce0
s1             0x3ff7ffd8a8 274743679144
a0             0x2aaab23e00 183252434432
a1             0x0  0
a2             0x2aaaaef658 183252219480
a3             0x1  1
a4             0x5a6359d952f93000   6513148276042706944
a5             0x5a6359d952f93000   6513148276042706944
a6             0x0  0
a7             0x2aaab2d1d8 183252472280
s2             0x3fffffef38 274877902648
s3             0x3ff7ffd8a8 274743679144
s4             0x2aaaaf29c8 183252232648
s5             0x2aaaad0e0c 183252094476
s6             0x0  0
s7             0x2aaab207e0 183252420576
s8             0x1  1
s9             0x0  0
s10            0x2aaab0a234 183252329012
s11            0x1  1
t3             0x3ff6ea9dc6 274725510598
t4             0x15 21
t5             0x1  1
t6             0x10 16
pc             0x2aaaacb554 0x2aaaacb554 <testContactWithKinematicJoint(std::shared_ptr<dart::constraint::BoxedLcpSolver> const&, double)+5146>

It seems like the program flow itself was broken, register t1 holds that suspicious 0x1000 but I'm not an riscv64 expert.

Might after all be a gcc-10 bug as it seemed to work fine with gcc-9 (or it now exposes a bug in dart). I can't see an immediate fix out of that, but I hope it helps you to see what might be going on.