ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.2k stars 9.72k forks source link

cyberRT segment fault on ARM for compile option O1 O2 O3 #11334

Open Allenhe123 opened 4 years ago

Allenhe123 commented 4 years ago

Describe the bug cyberRT will be segment fault on ARM for compile option O1 O2 O3.

  1. Edit file cyber.pb.conf and let default_proc_num has a value bigger than 1, for example: scheduler_conf { routine_num: 5 default_proc_num:2 }

  2. Then run a talker listener demo, it will crash.

my code as below: using namespace std; using namespace apollo::cyber; apollo::cyber::Init(argv[0]);

std::mutex mtx1;
   std::vector<string> recv_msgs1;
auto  talkernode =  CreateNode("reader_a");
string channelname("messaging");
auto talker = talkernode->CreateWriter<string>(channelname);
auto listener =  talkernode->CreateReader<string>(channelname, 
    [&](const std::shared_ptr<string>& msg) {
    std::lock_guard<std::mutex> lck(mtx1);
    cout << "reader_a recv a msg" << endl;
    recv_msgs1.emplace_back(*msg);
    });

for (;;) {
    auto msg1 = std::make_shared<string>("hello  world!");
    talker->Write(msg1);
    std::this_thread::sleep_for(std::chrono::duration<int, std::milli>(500));
    cout << "talker sent a msg" << endl;
    std::cout << "recvmsg size: " << recv_msgs1.size() << std::endl;
}   
return;
Allenhe123 commented 4 years ago

The issue caused by croutine context switch. The error take place near code: CRoutine::Resume() { ... ... SwapContext(GetMainStack(), GetStack()); currentroutine = nullptr; return state_; }

zhangyangang commented 3 years ago

any update ?

xuming7up commented 2 years ago

solution 1: File thirdparty/gpus/crosstools/cc_toolchain_config.bzl.tpl:357 add below 1 line: flags= [ "-g0" "-O2" + "-fPIC" "-ffunction-sections", ......................

solution 2: File thirdparty/gpus/crosstools/cc_toolchain_config.bzl.tpl:328 comment off below lines: #flag_group( # flags= ["-fPIE], # expand_if_not_available = "pic", #),

soultion 3: File toos/bazel.rc, any lines add: build --copt="fPIC" or build --cxxopt="fPIC"

all above 3 solutions can workaround this bug, but I cannot understand:

  1. -fPIC already added in thirdparty/gpus/crosstools/cc_toolchain_config.bzl.tpl, why add again with solution 1
  2. remove fPIE, issue disappeared, cannot explain
  3. why not happened on x86_64 platform with "build_opt" and "build_dbg" both
  4. Why only happened on aarch64 "build_opt", not happened on aarch64 "build_opt" with removed -O2(you can remove -O2 in file cc_toolchain_config.bzl.tpl to verify)
  5. issue not happened on aarch64 "build_dbg"

Test steps: ./apollo.sh build_opt(or build_opt_gpu) ./bazel-bin/cyber/example/listener ./bazel-bin/cyber/example/talk

listener will segmentation fault.

Can appollo guys explain the root cause with my testing?