DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.62k stars 554 forks source link

"sigaltstack too small in native thread" in detach_signal test #6880

Open derekbruening opened 2 months ago

derekbruening commented 2 months ago

This happened once on the aarch64-sve-precommit-256 test:

https://github.com/DynamoRIO/dynamorio/actions/runs/9914012718/job/27392240520?pr=6879

2024-07-12T21:08:37.5756527Z 358: detaching
2024-07-12T21:08:37.5756915Z 358: signal count post-detach: 66757
2024-07-12T21:08:37.5757319Z 358: native signals delivered: 59542
2024-07-12T21:08:37.5758963Z 358: <Application /opt/actions-runner/_work/dynamorio/dynamorio/build/build_debug-internal-64/suite/tests/bin/api.detach_signal (20510). Cannot correctly handle received signal 12 in thread 20513: sigaltstack too small in native thread.>

This code was recently changed by #6815 so it is possible this is an introduced regression.

abhinav92003 commented 2 months ago

This code was recently changed by #6815 so it is possible this is an introduced regression.

The change in #6815 was to avoid running out of stack space by reusing the existing frame during native signal delivery for an almost-detached thread (which has only the removal of DR main_signal_handler left); this should actually make "sigaltstack too small in native thread" less likely as we're now using less stack space for native signal delivery.

We'll need more info on the exact sequence of events happening here.

AssadHashmi commented 2 months ago

Could this regression have something to do with https://github.com/DynamoRIO/dynamorio/pull/6868 merged 3 days ago?

derekbruening commented 2 months ago

I logged into the aarch64-precommit machine but can't reproduce it there:

derek@dynamorio:~/dr/build$ ctest --repeat-until-fail 500 -R detach_signal
Test project /home/derek/dr/build
    Start 351: code_api|api.detach_signal
    Test #351: code_api|api.detach_signal .......   Passed    0.22 sec
    Start 351: code_api|api.detach_signal
    Test #351: code_api|api.detach_signal .......   Passed    0.21 sec
...
1/1 Test #351: code_api|api.detach_signal .......   Passed    0.19 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 104.28 sec

I don't have access to the aarch64-sve-precommit machine which is where it failed. @AssadHashmi maybe you could run it on that machine 1000x and see if it reproduces? If so maybe removing #6868 and repeating would show whether that is the culprit?