Open cpaelzer opened 4 years ago
Thank you for the report! I'm not familiar with the architectures (riscv64 and s390x). Could you please let me know where these architectures are generally used for? Also, I'm afraid that I'm not sure how to reproduce and debug the issues without having those architectures. Is there a way to emulate them?
Hi, they are on opposite end of the architecture spectrum.
s390x is hard to emulate right, but you can get access to a free virtual guest at https://developer.ibm.com/components/ibm-linuxone/gettingstarted/
riscv64 can be emulated via qemu like:
qemu-system-riscv64 -machine virt -m 2048 -smp 4 \
-bios qemu-virt-20200504-ubuntu-firmware.bin \
-device virtio-blk-device,drive=vda \
-drive file=grovy-updated-20200811.qcow2,id=vda \
-device virtio-net-device,netdev=eth0 \
-netdev user,id=eth0,hostfwd=tcp::5555-:22
You can get instructions and the base images (for Ubuntu 20.04) here https://people.ubuntu.com/~wgrant/riscv64/
I was slowly building this on a local riscv64 qemu emulation, I got to the same crash. For the sake of being generally usable I was testing another test:
root@ubuntu:~/dart-6.9.2# ./obj-riscv64-linux-gnu/unittests/unit/test_CollisionGroups
Running main() from /root/dart-6.9.2/unittests/gtest/src/gtest_main.cc
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from CollisionEngine/CollisionGroupsTest
[ RUN ] CollisionEngine/CollisionGroupsTest.SkeletonSubscription/0
Running CollisionGroups test for [dart]
[ OK ] CollisionEngine/CollisionGroupsTest.SkeletonSubscription/0 (78 ms)
...
[ OK ] CollisionEngine/CollisionGroupsTest.BodyNodeSubscription/3 (0 ms)
[----------] 8 tests from CollisionEngine/CollisionGroupsTest (194 ms total)
[----------] Global test environment tear-down
[==========] 8 tests from 1 test case ran. (198 ms total)
[ PASSED ] 8 tests.
So assuming that I can generally use this for debugging I was running the failing test and it crashed as expected.
root@ubuntu:~/dart-6.9.2# ./obj-riscv64-linux-gnu/unittests/unit/test_ContactConstraint
Running main() from /root/dart-6.9.2/unittests/gtest/src/gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ContactConstraint
[ RUN ] ContactConstraint.ContactWithKinematicJoint
Segmentation fault
Throwing the same into GDB is unfortunately far from helpful:
Starting program: /root/dart-6.9.2/obj-riscv64-linux-gnu/unittests/unit/test_ContactConstraint
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Running main() from /root/dart-6.9.2/unittests/gtest/src/gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ContactConstraint
[ RUN ] ContactConstraint.ContactWithKinematicJoint
Program received signal SIGSEGV, Segmentation fault.
0x0000000000001000 in ?? ()
(gdb) bt
#0 0x0000000000001000 in ?? ()
(gdb)
I was starting at testContactWithKinematicJoint
and stepping my way towards the problem.
It is a bit hard to track that way around, but once crashed the backtrace is empty.
The last hit of any of "getLinearVelocity,getSpatialVelocity,getTransform" is const Eigen::Vector6d& Frame::getSpatialVelocity() const
.
Then next I went deeper from there of the list "getSpatialVelocity,getRelativeTransform,AdInvT,getRelativeSpatialVelocity,getParentFrame" the last hit is dart::math::AdInvT (_T=..., _V=...) at ./dart/math/Geometry.cpp:714
I checked dart::math::AdInvT in a multiple runs and it was hit 5582 times before the crash every time. That allows to step onto the failing execution of AdInvT.
Stack at this point looks like
(gdb) bt
#0 0x0000003ff7d6ffe8 in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:619
#1 dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:136
#2 0x0000003ff7d71996 in dart::dynamics::Frame::getSpatialVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>,
_inCoordinatesOf=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>) at ./dart/dynamics/Frame.cpp:168
#3 0x0000003ff7d71aca in dart::dynamics::Frame::getLinearVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>,
_inCoordinatesOf=<optimized out>) at ./dart/dynamics/Frame.cpp:222
#4 0x0000002aaaacb390 in testContactWithKinematicJoint (lcpSolver=..., tol=9.9999999999999995e-07) at ./unittests/unit/test_ContactConstraint.cpp:88
#5 0x0000002aaaacba42 in ContactConstraint_ContactWithKinematicJoint_Test::TestBody (this=<optimized out>) at /usr/include/c++/10/new:175
#6 0x0000002aaaaeedea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x2aaaaf29c8 "the test body", method=<optimized out>, object=0x2aaab23ce0)
at ./unittests/gtest/src/gtest-internal-inl.h:807
#7 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x2aaab23ce0, method=<optimized out>,
location=location@entry=0x2aaaaf29c8 "the test body") at ./unittests/gtest/src/gtest.cc:2479
#8 0x0000002aaaae7502 in testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2517
#9 testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2508
#10 0x0000002aaaae75f0 in testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2693
#11 testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2667
#12 0x0000002aaaae7680 in testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2810
#13 testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2796
#14 0x0000002aaaae7a30 in testing::internal::UnitTestImpl::RunAllTests (this=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:592
#15 0x0000002aaaaef21a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:807
#16 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2aaab22b80, method=<optimized out>,
location=location@entry=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)") at ./unittests/gtest/src/gtest.cc:2479
#17 0x0000002aaaae7b46 in testing::UnitTest::Run (this=0x2aaab0a0e0 <testing::UnitTest::GetInstance()::instance>) at ./unittests/gtest/include/gtest/gtest.h:1340
#18 0x0000002aaaac5160 in RUN_ALL_TESTS () at ./unittests/gtest/include/gtest/gtest.h:2341
#19 main (argc=<optimized out>, argv=0x3ffffff458) at ./unittests/gtest/src/gtest_main.cc:36
From there it returns to:
0x0000003ff7d6ffd4 in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:619
From there I see it process some instructions from const Eigen::Isometry3d& WorldFrame::getRelativeTransform() const
(gdb) si 0x0000003ff7d6ffd6 619 return mRelativeTf; ...
As you can see compiler optimizations make this not perfectly linear.
I next see some commands from const Eigen::Vector6d& Frame::getSpatialVelocity() const
again. But then out of a sudden I see _RandomAccessIterator,
pop up:
(gdb) si
0x0000003ff7d70058 150 return mVelocity;
(gdb)
0x0000003ff7d7005a 150 return mVelocity;
(gdb)
0x0000003ff7d3ec9e in virtual thunk to dart::dynamics::BodyNode::getRelativeSpatialVelocity() const () at /usr/include/c++/10/bits/stl_heap.h:421
421 while (__last - __first > 1)
(gdb) frame
#0 0x0000003ff7d3ec9e in virtual thunk to dart::dynamics::BodyNode::getRelativeSpatialVelocity() const () at /usr/include/c++/10/bits/stl_heap.h:421
421 while (__last - __first > 1)
That still passes fine and I see code from
0x0000003ff7cf80c8 in dart::dynamics::Joint::getRelativeSpatialVelocity() const@plt () from /root/dart-6.9.2/obj-riscv64-linux-gnu/lib/libdart.so.6
At this point I am at:
(gdb) bt
#0 0x0000003ff7dac22a in dart::dynamics::Joint::getRelativeSpatialVelocity (this=0x2aaab2cea0) at ./dart/dynamics/Joint.cpp:355
#1 0x0000003ff7d7005c in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:150
#2 dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:136
#3 0x0000003ff7d71996 in dart::dynamics::Frame::getSpatialVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>,
_inCoordinatesOf=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>) at ./dart/dynamics/Frame.cpp:168
#4 0x0000003ff7d71aca in dart::dynamics::Frame::getLinearVelocity (this=this@entry=0x2aaab2e768, _relativeTo=_relativeTo@entry=0x3ff7fe0318 <dart::dynamics::Frame::World()::world>,
_inCoordinatesOf=<optimized out>) at ./dart/dynamics/Frame.cpp:222
#5 0x0000002aaaacb390 in testContactWithKinematicJoint (lcpSolver=..., tol=9.9999999999999995e-07) at ./unittests/unit/test_ContactConstraint.cpp:88
#6 0x0000002aaaacba42 in ContactConstraint_ContactWithKinematicJoint_Test::TestBody (this=<optimized out>) at /usr/include/c++/10/new:175
#7 0x0000002aaaaeedea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x2aaaaf29c8 "the test body", method=<optimized out>, object=0x2aaab23ce0)
at ./unittests/gtest/src/gtest-internal-inl.h:807
#8 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x2aaab23ce0, method=<optimized out>,
location=location@entry=0x2aaaaf29c8 "the test body") at ./unittests/gtest/src/gtest.cc:2479
#9 0x0000002aaaae7502 in testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2517
#10 testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2508
#11 0x0000002aaaae75f0 in testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2693
#12 testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2667
#13 0x0000002aaaae7680 in testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2810
#14 testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2796
#15 0x0000002aaaae7a30 in testing::internal::UnitTestImpl::RunAllTests (this=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:592
#16 0x0000002aaaaef21a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:807
#17 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2aaab22b80, method=<optimized out>,
location=location@entry=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)") at ./unittests/gtest/src/gtest.cc:2479
#18 0x0000002aaaae7b46 in testing::UnitTest::Run (this=0x2aaab0a0e0 <testing::UnitTest::GetInstance()::instance>) at ./unittests/gtest/include/gtest/gtest.h:1340
#19 0x0000002aaaac5160 in RUN_ALL_TESTS () at ./unittests/gtest/include/gtest/gtest.h:2341
#20 main (argc=<optimized out>, argv=0x3ffffff458) at ./unittests/gtest/src/gtest_main.cc:36
Returning further from here ...
A call to
(gdb) bt
#0 0x0000003ff7d819fa in non-virtual thunk to dart::dynamics::GenericJoint<dart::math::SE3Space>::updateRelativeSpatialVelocity() const () at ./dart/dynamics/Joint.cpp:668
#1 0x0000003ff7dac23c in dart::dynamics::Joint::getRelativeSpatialVelocity (this=0x2aaab2cea0) at ./dart/dynamics/Joint.cpp:357
#2 0x0000003ff7d7005c in dart::dynamics::Frame::getSpatialVelocity (this=0x2aaab2e768) at ./dart/dynamics/Frame.cpp:150
Further through
dart::dynamics::GenericJoint<dart::math::SE3Space>::getRelativeJacobianStatic (this=0x2aaab2ca20) at ./dart/dynamics/detail/GenericJoint.hpp:1588
Then I seem to be back up at the test within testContactWithKinematicJoint
(gdb) bt
#0 testContactWithKinematicJoint (lcpSolver=..., tol=<optimized out>) at ./unittests/unit/test_ContactConstraint.cpp:48
#1 0x0000002aaaacba42 in ContactConstraint_ContactWithKinematicJoint_Test::TestBody (this=<optimized out>) at /usr/include/c++/10/new:175
#2 0x0000002aaaaeedea in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x2aaaaf29c8 "the test body", method=<optimized out>, object=0x2aaab23ce0)
at ./unittests/gtest/src/gtest-internal-inl.h:807
#3 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x2aaab23ce0, method=<optimized out>,
location=location@entry=0x2aaaaf29c8 "the test body") at ./unittests/gtest/src/gtest.cc:2479
#4 0x0000002aaaae7502 in testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2517
#5 testing::Test::Run (this=0x2aaab23ce0) at ./unittests/gtest/src/gtest.cc:2508
#6 0x0000002aaaae75f0 in testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2693
#7 testing::TestInfo::Run (this=0x2aaab22a20) at ./unittests/gtest/src/gtest.cc:2667
#8 0x0000002aaaae7680 in testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2810
#9 testing::TestCase::Run (this=0x2aaab22de0) at ./unittests/gtest/src/gtest.cc:2796
#10 0x0000002aaaae7a30 in testing::internal::UnitTestImpl::RunAllTests (this=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:592
#11 0x0000002aaaaef21a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x2aaab22b80) at ./unittests/gtest/src/gtest-internal-inl.h:807
#12 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2aaab22b80, method=<optimized out>,
location=location@entry=0x2aaaaf2bd0 "auxiliary test code (environments or event listeners)") at ./unittests/gtest/src/gtest.cc:2479
#13 0x0000002aaaae7b46 in testing::UnitTest::Run (this=0x2aaab0a0e0 <testing::UnitTest::GetInstance()::instance>) at ./unittests/gtest/include/gtest/gtest.h:1340
#14 0x0000002aaaac5160 in RUN_ALL_TESTS () at ./unittests/gtest/include/gtest/gtest.h:2341
#15 main (argc=<optimized out>, argv=0x3ffffff458) at ./unittests/gtest/src/gtest_main.cc:36
Something suspicious here:
(gdb) n
77 for (auto i = 0u; i < 100; ++i)
(gdb)
Cannot access memory at address 0x1000
Remember that value of 0x1000
is where my backtrace is eventually dead and stuck.
From here I can't go on, the instruction pointer is stuck - gdb can't continue anymore.
The last sign of life before this was at:
testContactWithKinematicJoint (lcpSolver=..., tol=9.9999999999999995e-07) at ./unittests/gtest/include/gtest/gtest.h:317
317 operator bool() const { return success_; } // NOLINT
And a gdb "next" command got me where I'm dead. The return from there must have killed it, but I don't see how.
(gdb) info registers
ra 0x2aaaacba42 0x2aaaacba42 <ContactConstraint_ContactWithKinematicJoint_Test::TestBody()+120>
sp 0x3fffffef30 0x3fffffef30
gp 0x2aaab09ac0 0x2aaab09ac0
tp 0x3ff44d2720 0x3ff44d2720
t0 0x2aaab3be98 183252532888
t1 0x1000 4096
t2 0xfffffffffffffe68 -408
fp 0x2aaab23ce0 0x2aaab23ce0
s1 0x3ff7ffd8a8 274743679144
a0 0x2aaab23e00 183252434432
a1 0x0 0
a2 0x2aaaaef658 183252219480
a3 0x1 1
a4 0x5a6359d952f93000 6513148276042706944
a5 0x5a6359d952f93000 6513148276042706944
a6 0x0 0
a7 0x2aaab2d1d8 183252472280
s2 0x3fffffef38 274877902648
s3 0x3ff7ffd8a8 274743679144
s4 0x2aaaaf29c8 183252232648
s5 0x2aaaad0e0c 183252094476
s6 0x0 0
s7 0x2aaab207e0 183252420576
s8 0x1 1
s9 0x0 0
s10 0x2aaab0a234 183252329012
s11 0x1 1
t3 0x3ff6ea9dc6 274725510598
t4 0x15 21
t5 0x1 1
t6 0x10 16
pc 0x2aaaacb554 0x2aaaacb554 <testContactWithKinematicJoint(std::shared_ptr<dart::constraint::BoxedLcpSolver> const&, double)+5146>
It seems like the program flow itself was broken, register t1
holds that suspicious 0x1000 but I'm not an riscv64 expert.
Might after all be a gcc-10 bug as it seemed to work fine with gcc-9 (or it now exposes a bug in dart). I can't see an immediate fix out of that, but I hope it helps you to see what might be going on.
Environment
Expected Behavior
Build and tests work fine
Current Behavior
On s390x the build time tests fail
Here the initial build log
I have rebuilt the same with verbosity as recommended - the build is in this PPA and you can access the logs for all riscv64 from there.
Steps to Reproduce