autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
968 stars 631 forks source link

freespace_planning_algorithms-test fails #2439

Closed kosuke55 closed 1 year ago

kosuke55 commented 1 year ago

Checklist

Description

1: [ RUN      ] RRTStarTestSuite.Update
1: measuring average performance ...
1: plan success : 201.148[msec], solution cost : 43.6063
1: [INFO] [1668999471.297964168] [rosbag2_storage]: Opened database '/tmp/fpalgos-rrtstar_update-case0/fpalgos-rrtstar_update-case0_0.db3' for READ_WRITE.
1: measuring average performance ...
1/6 Test #1: freespace_planning_algorithms-test ...***Timeout  60.00 sec
test 2
    Start 2: rrtstar_core_informed-test
1: [ RUN      ] AstarSearchTestSuite.SingleCurvature
1: plan success : 598.159[msec], solution cost : 34.2057
1: malloc(): invalid size (unsorted)
1: -- run_test.py: return code -6
1: -- run_test.py: generate result file '/__w/autoware.universe/autoware.universe/build/freespace_planning_algorithms/test_results/freespace_planning_algorithms/freespace_planning_algorithms-test.gtest.xml' with failed test
1: -- run_test.py: verify result file '/__w/autoware.universe/autoware.universe/build/freespace_planning_algorithms/test_results/freespace_planning_algorithms/freespace_planning_algorithms-test.gtest.xml'
1/6 Test #1: freespace_planning_algorithms-test ...***Failed    1.10 sec

Expected behavior

all tests pass

Actual behavior

tests fails

Steps to reproduce

run tests

Versions

No response

Possible causes

No response

Additional context

No response

kosuke55 commented 1 year ago

cc @HiroIshida If you know anything about this, I would appreciate it if you could let me know.

HiroIshida commented 1 year ago

1) As for RRT-star, the algorithm could occasionally fail at finding solution because of it's random nature. One workaround for this could be fixing the random seed of the planner.

2) As for the second issue, I have no idea. Is this issue happens only sometimes and usually no problem? Also, is this issue start to occur recently?

I can work on these issue, but I don't have time until 12/9.

kosuke55 commented 1 year ago

Thanks for the replay. (and good luck with your thesis!)

  1. As for RRT-star, the algorithm could occasionally fail at finding a solution because of its random nature. One workaround for this could be fixing the random seed of the planner.

the random seed seems good.

  1. As for the second issue, I have no idea. Is this issue happens only sometimes and usually no problem? Also, is this issue start to occur recently?

It seems to happen occasionally, not every time. It has been seen since switching humble.

kosuke55 commented 1 year ago

These tests are commented out temporary in https://github.com/autowarefoundation/autoware.universe/pull/2440. We need to fix them.

HiroIshida commented 1 year ago

When I run the node on humble (in clean docker environment), I couldn't get any error.

When I compile the code with address sanitizer https://github.com/google/sanitizers/wiki/AddressSanitizer, it shows some memory leaks around rosbag2, and so there is a chance thatlibclass_loader.so from class_loader package https://github.com/ros/class_loader is related to bug

Address sanitizer didn't show any memory error other than that.

h-ishida@03238e19ce66:~/autoware/src/universe/autoware.universe/planning/freespace_planning_algorithms$ ./build/freespace_planning_algorithms-test 
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from AstarSearchTestSuite
[ RUN      ] AstarSearchTestSuite.SingleCurvature
plan success : 582.046[msec], solution cost : 34.2057
[INFO] [1671047305.929521507] [rosbag2_storage]: Opened database '/tmp/fpalgos-astar_single-case0/fpalgos-astar_single-case0_0.db3' for READ_WRITE.
plan success : 5531.43[msec], solution cost : 33.5224
[INFO] [1671047311.723573118] [rosbag2_storage]: Opened database '/tmp/fpalgos-astar_single-case1/fpalgos-astar_single-case1_0.db3' for READ_WRITE.
plan success : 179.765[msec], solution cost : 37.5654
[INFO] [1671047312.022203706] [rosbag2_storage]: Opened database '/tmp/fpalgos-astar_single-case2/fpalgos-astar_single-case2_0.db3' for READ_WRITE.
plan success : 1588.64[msec], solution cost : 42.8882
[INFO] [1671047313.720065870] [rosbag2_storage]: Opened database '/tmp/fpalgos-astar_single-case3/fpalgos-astar_single-case3_0.db3' for READ_WRITE.
[       OK ] AstarSearchTestSuite.SingleCurvature (8651 ms)
[----------] 1 test from AstarSearchTestSuite (8652 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8652 ms total)
[  PASSED  ] 1 test.

=================================================================
==849==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7f2278c341c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f2273be6d2b  (<unknown module>)
    #2 0x7f227959147d  (/lib64/ld-linux-x86-64.so.2+0x647d)

Indirect leak of 213 byte(s) in 4 object(s) allocated from:
    #0 0x7f2278c341c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f2277614e6e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (/lib/x86_64-linux-gnu/libstdc++.so.6+0x14be6e)

Indirect leak of 152 byte(s) in 1 object(s) allocated from:
    #0 0x7f2278c341c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f2278a3f47d in class_loader::impl::AbstractMetaObjectBase::AbstractMetaObjectBase(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (/opt/ros/humble/lib/libclass_loader.so+0x947d)

Indirect leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7f2278c341c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f2278a3d45a  (/opt/ros/humble/lib/libclass_loader.so+0x745a)

SUMMARY: AddressSanitizer: 389 byte(s) leaked in 7 allocation(s).

class loader is used in rosbag2, so probably, by removing rosbag2, the issue could be fixed.

h-ishida@03238e19ce66:~/autoware/src/universe/autoware.universe/planning/freespace_planning_algorithms/build$ ldd /opt/ros/humble/lib/librosbag2_storage.so
    linux-vdso.so.1 (0x00007f2265015000)
    libyaml-cpp.so.0.7 => /lib/x86_64-linux-gnu/libyaml-cpp.so.0.7 (0x00007f2264f40000)
    libament_index_cpp.so => /opt/ros/humble/lib/libament_index_cpp.so (0x00007f2264f35000)
    libclass_loader.so => /opt/ros/humble/lib/libclass_loader.so (0x00007f2264f22000)
    librcpputils.so => /opt/ros/humble/lib/librcpputils.so (0x00007f2264f12000)
    librcutils.so => /opt/ros/humble/lib/librcutils.so (0x00007f2264efa000)
    libconsole_bridge.so.1.0 => /lib/x86_64-linux-gnu/libconsole_bridge.so.1.0 (0x00007f2264ef4000)
    libtinyxml2.so.9 => /lib/x86_64-linux-gnu/libtinyxml2.so.9 (0x00007f2264edc000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2264cb2000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2264c92000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2264a68000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f2265017000)
HiroIshida commented 1 year ago

@kosuke55 In summary, I think either of the PR will fix the issue.

If the error is related to rosbag, then this PR fix https://github.com/autowarefoundation/autoware.universe/pull/2504

If the error is caused planning failed due to timeout, the following PR fix the issue https://github.com/autowarefoundation/autoware.universe/pull/2505 However, it is almost impossible that it takes more than 10 seconds to solve a problem by Astar, this PR probably will not fix this issue. However, if in the CI, test is done by multiprocessing or something like that, it could be possible that astar cannot use CPU enough, and it takes much much time than normal. Do you know the rostests in CI is running in a single process?