Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
919 stars 207 forks source link

Crash likely caused by stack overflow in recursive functions (or using Intel i9-13900K/14900K CPU) #5449

Closed memN0ps closed 4 months ago

memN0ps commented 5 months ago

Version and Platform (required):

Bug Description:

Binary Ninja crashes when loading ntoskrnl.exe after 10-15 seconds.

Steps To Reproduce:

  1. Go to "File"
  2. Click on "Open"
  3. Select ntoskrnl.exe
  4. Wait for Binary Ninja to crash

Expected Behavior:

The expected behavior is that it should not crash.

Additional Information:

Stack trace output from WinDbg on version: 4.1.5339-dev

This exception may be expected and handled.
binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x1b09e:
00007ffa`31006aae 498b4f50        mov     rcx,qword ptr [r15+50h] ds:00000000`00000050=????????????????
0:046> k
 # Child-SP          RetAddr               Call Site
00 00000020`155fe290 00007ffa`31009943     binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x1b09e
01 00000020`155fe6c0 00007ffa`30ffe94b     binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x1df33
02 00000020`155feaa0 00007ffa`307d07d8     binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x12f3b
03 00000020`155fee00 00007ffa`3100855e     binaryninjacore+0x5607d8
04 00000020`155fef00 00007ffa`3100828d     binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x1cb4e
05 00000020`155ff040 00007ffa`3118355b     binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x1c87d
06 00000020`155ff1a0 00007ffa`30f3f646     binaryninjacore!BNGetHighLevelILVariables+0x31c7b
07 00000020`155ff5c0 00007ffa`30f0fb9e     binaryninjacore!BNSetFlowGraphNodeLines+0xf7926
08 00000020`155ff720 00007ffa`315e821e     binaryninjacore!BNSetFlowGraphNodeLines+0xc7e7e
09 00000020`155ff750 00007ffa`3178e4a8     binaryninjacore!BNTagTypeSetVisible+0x4f2e
0a 00000020`155ff7e0 00007ffa`31789201     binaryninjacore!BNWriteWebsocketClientData+0x85b8
0b 00000020`155ff930 00007ffb`07679333     binaryninjacore!BNWriteWebsocketClientData+0x3311
0c 00000020`155ff960 00007ffb`093e257d     ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x93
0d 00000020`155ff990 00007ffb`0a0caa48     KERNEL32!BaseThreadInitThunk+0x1d
0e 00000020`155ff9c0 00000000`00000000     ntdll!RtlUserThreadStart+0x28
xusheng6 commented 5 months ago

I have tried on both Windows and macOS, and cannot reproduce this. @memN0ps do you have any plugins that could interfere with this? How much RAM do you have on your computer?

psifertex commented 5 months ago

You can also try https://docs.binary.ninja/guide/troubleshooting.html#troubleshooting-plugins which will show how to test without plugins or settings to see if those are involved.

notpidgey commented 5 months ago

I'm not running any plugins and I have the same issue. This happens ntoskrnl every time but also occurs with other binaries. Unfortunately I cannot find any pattern (binary size, dll/exe, etc) as of why this happens but its extremely frustrating and has been happening to me for a while.

CPU: Intel 13900k x64 OS: Windows 10 22H2 Build: 4.0.4958 Personal (ddff9339)

memN0ps commented 5 months ago

Hey all,

Thanks for getting back to me.

Sorry for not reporting this earlier. This issue has been happening across different Binary Ninja builds and various Windows 10 and 11 versions.

I have tried a fresh install of Binary Ninja in a new VM and different builds of ntoskrnl.exe. Here are the specifications for my PC:

I've tried this on my server as well with specifications:

I can confirm that there are no crashes when loading ntoskrnl.exe on my MacBook Pro using Binary Ninja (4.0.4958). Here are the specs for the MacBook:

xusheng6 commented 5 months ago

Hey all,

Thanks for getting back to me.

Sorry for not reporting this earlier. This issue has been happening across different Binary Ninja builds and various Windows 10 and 11 versions.

I have tried a fresh install of Binary Ninja in a new VM and different builds of ntoskrnl.exe. Here are the specifications for my PC:

* Processor: Intel i9-13900k

* RAM: 128GB

* GPU: RTX 4090

I've tried this on my server as well with specifications:

* Processor: Intel i9-14900k

* RAM: 128GB

* GPU: RTX 4070

I can confirm that there are no crashes when loading ntoskrnl.exe on my MacBook Pro using Binary Ninja (4.0.4958). Here are the specs for the MacBook:

* Processor: Apple M2

* OS: 14.4.1 (23E224)

* RAM: 24GB

Could you test it with a Windows VM that has a moderate amount of CPU cores and RAM? I figured that might be the case that you have so many CPU cores, and it is a race condition that is hard to reproduce with lower CPU cores. The RAM is probably not related, but lowering it to 32GB might be reasonable.

xusheng6 commented 5 months ago

I'm not running any plugins and I have the same issue. This happens ntoskrnl every time but also occurs with other binaries. Unfortunately I cannot find any pattern (binary size, dll/exe, etc) as of why this happens but its extremely frustrating and has been happening to me for a while.

CPU: Intel 13900k x64 OS: Windows 10 22H2 Build: 4.0.4958 Personal (ddff9339)

Hi, @notpidgey, thx for letting us know. Do you have a binary that can reproduce it? Also, can you share with us a stack trace at the crash?

memN0ps commented 5 months ago

Hi everyone,

I can confirm that Binary Ninja version 4.0.4958 (Stable) successfully loads ntoskrnl.exe without crashing under the following conditions:

OS Information:

VM Settings: (see screenshot below) image

@xusheng6 appears to be correct.

Thanks.

memN0ps commented 5 months ago

Hey,

Is there any ETA to fix this issue? It's a bit annoying not being able to use it on the host OS given it's paid software.

I've come across another issue, not sure if it's related to above or not but the steps to reproduce are the following:

Step 1. Download Binary Ninja Stable x86_64 Step 2. Run /binaryninja/scripts/linux-setup.sh Step 3. Launch Binary Ninja from the UI or via the terminal using ./binaryninja Step 4. Observed the window open for a few seconds and crash with Segmentation fault (core dumped)

Host OS: Linux fedora 6.8.10-300.fc40.x86_64

psifertex commented 5 months ago

We have no ETA at this time. Without a local reproduction it's much more difficult to fix. It sounds like we've at least identified a work-around in the near-term? Just limiting the number of cores? We often test on machines with lager core-counts so I suspect the OS is also at play here and once you get it working on Fedora (see below) even with the higher thread count it should be fine.

In the future, please file other issues under a separate issue as opposed to just adding on to this thread.

Fedora Linux has a known incompatibility with the QT version we're using. If you download the latest dev branch, the fix is already landed there. (You can use https://binary.ninja/recover/ and switch the drop-down to "dev" to get that installer)

memN0ps commented 5 months ago

Hi @psifertex,

Thanks for getting back to me.

I just downloaded the latest dev branch on the host OS Linux fedora 6.8.10-300.fc40.x86_64 and loaded the same ntoskrnl.exe file and can confirm it's working fine on the server with the specifications mentioned here: https://github.com/Vector35/binaryninja-api/issues/5449#issuecomment-2130672715

Cheers.

psifertex commented 5 months ago

Great. So it sounds like it's specific to high thread count and windows and that particular file. That makes more sense why it hasn't been reported before.

You can also try changing the worker thread count too on windows instead of having to use a VM or boot into Linux:

https://docs.binary.ninja/guide/settings.html#analysis.limits.workerThreadCount

Hopefully we can get a good reproduction or resolve it without that soon.

notpidgey commented 5 months ago

I'm not running any plugins and I have the same issue. This happens ntoskrnl every time but also occurs with other binaries. Unfortunately I cannot find any pattern (binary size, dll/exe, etc) as of why this happens but its extremely frustrating and has been happening to me for a while. CPU: Intel 13900k x64 OS: Windows 10 22H2 Build: 4.0.4958 Personal (ddff9339)

Hi, @notpidgey, thx for letting us know. Do you have a binary that can reproduce it? Also, can you share with us a stack trace at the crash?

I apologize for the late response. I am not sure if this is any use to you but here are several crash dump stack traces I could recover:

Case 1 (most common case which happens several times):

STACK_TEXT:  
00000004`dbefddc8 00000000`00000008     : 00006b61`000001e7 00007ff9`e7a607c0 00000000`00000000 00000000`00000000 : 0x00000004`dbefde50
00000004`dbefddd0 00006b61`000001e7     : 00007ff9`e7a607c0 00000000`00000000 00000000`00000000 00000004`dbefdec9 : 0x8
00000004`dbefddd8 00007ff9`e7a607c0     : 00000000`00000000 00000000`00000000 00000004`dbefdec9 00007ff8`bc5299a3 : 0x00006b61`000001e7
00000004`dbefdde0 00007ff8`bc544822     : 00000004`dbefe160 00000000`00000000 00000004`dbefdf60 00000000`00000001 : ntdll!RtlSetLastWin32Error+0x40
00000004`dbefde30 00007ff8`bce6d922     : 00000000`00000019 00000000`00000001 00000004`dbefeaa0 00000000`00000156 : binaryninjacore+0x494822
00000004`dbefdf30 00007ff8`bce6d848     : 00000004`dbefe420 000001b8`9e5b0800 00000004`dbefe030 00000004`dbefeac0 : binaryninjacore!BNGetHighLevelILVariables+0x16332
00000004`dbefdfc0 00007ff8`bce69f09     : 000001b8`9e5b0800 000001b8`9e5b0800 00000004`dbefe4f8 00000004`dbefeaa0 : binaryninjacore!BNGetHighLevelILVariables+0x16258
00000004`dbefe060 00007ff8`bc5448a8     : 00000004`dbefe750 00000000`00000000 00000004`dbefe750 00000000`00000000 : binaryninjacore!BNGetHighLevelILVariables+0x12919
00000004`dbefe3c0 00007ff8`bce6d922     : 000001b9`278b0e01 00000000`00000000 00000000`00000001 000001b7`d564c460 : binaryninjacore+0x4948a8
00000004`dbefe4c0 00007ff8`bce62c56     : 000001b8`00000001 000001b8`9e5b0800 000001b9`278b0e60 00000000`00000001 : binaryninjacore!BNGetHighLevelILVariables+0x16332
00000004`dbefe550 00007ff8`bce8e9db     : 00000004`dbefeaa0 00000000`00000000 00000000`00000000 000001b9`04f53bf8 : binaryninjacore!BNGetHighLevelILVariables+0xb666
00000004`dbefe870 00007ff8`bce88058     : 000001b8`97502b80 00007ff8`0000000d 000001b8`cbcf4e20 00000000`0000053e : binaryninjacore!BNGetHighLevelILVariables+0x373eb
00000004`dbeff030 00007ff8`bcc4fd66     : 00000004`dbeff4c0 000001b8`bb9d4380 000001b8`5be26200 000001b8`bb9d45b8 : binaryninjacore!BNGetHighLevelILVariables+0x30a68
00000004`dbeff450 00007ff8`bcc1e5ce     : 000001b8`5be26200 000001b8`5be26200 00000000`00000006 000001b7`d704f3d8 : binaryninjacore!BNSetFlowGraphNodeLines+0xf2956
00000004`dbeff5b0 00007ff8`bd2cdb5e     : 000001b7`d7050800 00000000`00000000 00000000`00000000 00000000`0000000f : binaryninjacore!BNSetFlowGraphNodeLines+0xc11be
00000004`dbeff5e0 00007ff8`bd46a0c8     : 00000004`dbeff6f8 00000000`0000000f 00000000`00000006 00000004`dbeff770 : binaryninjacore!BNTagTypeSetVisible+0x4f4e
00000004`dbeff670 00007ff8`bd464e21     : 00000000`00000000 000001b7`d8603170 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x85b8
00000004`dbeff7c0 00007ff9`e5841bb2     : 000001b7`d8e92440 00000000`00000000 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x3311
00000004`dbeff7f0 00007ff9`e5dc7344     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
00000004`dbeff820 00007ff9`e7a626b1     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
00000004`dbeff850 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

Other:

00000058`895fe9d8 00007ff8`ff595ce2     : 00000058`895fea70 00000000`00000008 00000000`00000000 00007ff9`006b64d1 : VCRUNTIME140!memset+0x1d4
00000058`895fe9e0 00007ff8`ff5799a3     : 00000058`895fea70 00000058`895feb90 00000058`895ff208 00000058`895fef98 : binaryninjacore+0x495ce2
00000058`895fea20 00007ff8`ff594822     : 00000058`895fef40 00000000`00000000 00000058`895feb90 00000000`00000000 : binaryninjacore+0x4799a3
00000058`895fea50 00007ff8`ffebd600     : 00000000`00000000 00000228`0f00a1f0 0000022a`13540140 00000000`00000000 : binaryninjacore+0x494822
00000058`895feb50 00007ff8`ffe90bc1     : 0000022a`1aaf0000 00000000`000000d0 00000000`00000001 00007ff8`ffe0792a : binaryninjacore!BNGetHighLevelILVariables+0x16010
00000058`895febf0 00007ff8`ffe91809     : 00000000`00000000 0000022a`16641380 0000022a`0ee64600 0000022a`13540140 : binaryninjacore!BNUpdateHighLevelILOperand+0x770f1
00000058`895fecc0 00007ff8`ffe02980     : 00000058`895ff200 0000022a`0ee64600 0000022a`0ee64600 00000000`00000001 : binaryninjacore!BNUpdateHighLevelILOperand+0x77d39
00000058`895ff1e0 00007ff8`ffd66083     : 0000022a`0ee64600 00000058`895ff720 00000058`895ff720 00000000`00000001 : binaryninjacore!BNRegisterGlobalFunctionRecognizer+0xb8fd0
00000058`895ff410 00007ff8`ffed828b     : 00000058`895ff720 0000022a`183d1e00 00007ff9`04453f28 00000058`895ffa18 : binaryninjacore!BNRegisterGlobalFunctionRecognizer+0x1c6d3
00000058`895ff570 00007ff8`ffc9fd66     : 00000058`895ffa00 00000000`00000000 00000228`47204d00 00000228`67c4e628 : binaryninjacore!BNGetHighLevelILVariables+0x30c9b
00000058`895ff990 00007ff8`ffc6e5ce     : 00000228`47204d00 00000228`47204d00 00000000`00000006 00000228`67c4e628 : binaryninjacore!BNSetFlowGraphNodeLines+0xf2956
00000058`895ffaf0 00007ff9`0031db5e     : 00000228`67c4ed90 00000000`00000000 00000000`00000000 00000000`0000000f : binaryninjacore!BNSetFlowGraphNodeLines+0xc11be
00000058`895ffb20 00007ff9`004ba0c8     : 00000058`895ffc38 00000000`0000000f 00000000`00000006 00000058`895ffcb0 : binaryninjacore!BNTagTypeSetVisible+0x4f4e
00000058`895ffbb0 00007ff9`004b4e21     : 00000000`00000000 00000228`11f00e40 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x85b8
00000058`895ffd00 00007ff9`e5841bb2     : 00000228`1765ec40 00000000`00000000 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x3311
00000058`895ffd30 00007ff9`e5dc7344     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
00000058`895ffd60 00007ff9`e7a626b1     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
00000058`895ffd90 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
0000005a`077ff058 00000000`00000008     : 0000018f`5b9b0920 00000000`00000001 00000192`76b70580 0000005a`077ff410 : 0x0000005a`077ff100
0000005a`077ff060 0000018f`5b9b0920     : 00000000`00000001 00000192`76b70580 0000005a`077ff410 00000191`999b25f0 : 0x8
0000005a`077ff068 00000000`00000001     : 00000192`76b70580 0000005a`077ff410 00000191`999b25f0 00007ff8`bc528dca : 0x0000018f`5b9b0920
0000005a`077ff070 00000192`76b70580     : 0000005a`077ff410 00000191`999b25f0 00007ff8`bc528dca 0000005a`077ff100 : 0x1
0000005a`077ff078 0000005a`077ff410     : 00000191`999b25f0 00007ff8`bc528dca 0000005a`077ff100 00000000`00000000 : 0x00000192`76b70580
0000005a`077ff080 00000191`999b25f0     : 00007ff8`bc528dca 0000005a`077ff100 00000000`00000000 00000000`00000001 : 0x0000005a`077ff410
0000005a`077ff088 00007ff8`bc528dca     : 0000005a`077ff100 00000000`00000000 00000000`00000001 00000000`00000000 : 0x00000191`999b25f0
0000005a`077ff090 00007ff8`bc544b33     : 00007ff8`bcdb7d40 00000000`00000000 0000005a`077ff1e0 0000005a`077ff4d0 : binaryninjacore+0x478dca
0000005a`077ff0e0 00007ff8`bcdb2871     : 00000190`de243c00 0000005a`077ff480 00000000`00000032 00000190`de243c00 : binaryninjacore+0x494b33
0000005a`077ff380 00007ff8`bce882cb     : 0000005a`077ff730 00000191`18d44600 00007ff8`c1403f28 0000005a`077ffa28 : binaryninjacore!BNRegisterGlobalFunctionRecognizer+0xb8ec1
0000005a`077ff580 00007ff8`bcc4fd66     : 0000005a`077ffa10 00000192`67780500 0000018f`d6343100 00000192`677805f8 : binaryninjacore!BNGetHighLevelILVariables+0x30cdb
0000005a`077ff9a0 00007ff8`bcc1e5ce     : 0000018f`d6343100 0000018f`d6343100 00000000`00000006 0000018f`68a648d8 : binaryninjacore!BNSetFlowGraphNodeLines+0xf2956
0000005a`077ffb00 00007ff8`bd2cdb5e     : 0000018f`68a642c0 00000000`00000000 00000000`00000000 00000000`0000000f : binaryninjacore!BNSetFlowGraphNodeLines+0xc11be
0000005a`077ffb30 00007ff8`bd46a0c8     : 0000005a`077ffc48 00000000`0000000f 00000000`00000006 0000005a`077ffcc0 : binaryninjacore!BNTagTypeSetVisible+0x4f4e
0000005a`077ffbc0 00007ff8`bd464e21     : 00000000`00000000 0000018f`64239560 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x85b8
0000005a`077ffd10 00007ff9`e5841bb2     : 0000018f`652c8ca0 00000000`00000000 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x3311
0000005a`077ffd40 00007ff9`e5dc7344     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
0000005a`077ffd70 00007ff9`e7a626b1     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
0000005a`077ffda0 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
00000062`147fb9a0 00007ff8`bcfee534     : 00000000`00000001 00000000`00000000 00000062`147fbac0 00000254`ab910800 : binaryninjacore!BNRegisterCallingConvention+0x9ee60
00000062`147fb9d0 00007ff8`bcff8021     : 00000255`4b101f80 00000255`4b101f80 00000000`00000021 00000000`00000060 : binaryninjacore!BNWaitForMainThreadAction+0x18524
00000062`147fba10 00007ff8`bd06f1b6     : 00000255`adae4f90 00000255`adae4f98 00000000`00000000 00000000`000002f6 : binaryninjacore!BNWaitForMainThreadAction+0x22011
00000062`147fbda0 00007ff8`bcff85e2     : 00000062`147fc570 00000062`147fc808 00000000`00000000 00000000`00000001 : binaryninjacore!BNGetMediumLevelILVariables+0x1d726
00000062`147fc540 00007ff8`bd0968f2     : ffffffff`ffffffff 00007ff8`bcff8580 00000062`147fc6c0 00000062`147ff8b0 : binaryninjacore!BNWaitForMainThreadAction+0x225d2
00000062`147fc5c0 00007ff8`bcc50816     : 00000062`147ff8b0 00000254`ab360000 00000000`00000006 00000000`00000000 : binaryninjacore!BNGetMediumLevelILVariables+0x44e62
00000062`147ff700 00007ff8`bcc1e525     : 00000255`54692a00 00000254`cfa52300 00000000`00000006 00000254`b1a0e158 : binaryninjacore!BNSetFlowGraphNodeLines+0xf3406
00000062`147ffb60 00007ff8`bd2cdb5e     : 00000254`b1a0e680 00000000`00000000 00000000`00000000 00000000`0000000f : binaryninjacore!BNSetFlowGraphNodeLines+0xc1115
00000062`147ffb90 00007ff8`bd46a0c8     : 00000062`147ffca8 00000000`0000000f 00000000`00000006 00000062`147ffd20 : binaryninjacore!BNTagTypeSetVisible+0x4f4e
00000062`147ffc20 00007ff8`bd464e21     : 00000000`00000000 00000254`ac928d70 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x85b8
00000062`147ffd70 00007ff9`e5841bb2     : 00000254`ac5dd490 00000000`00000000 00000000`00000000 00000000`00000000 : binaryninjacore!BNWriteWebsocketClientData+0x3311
00000062`147ffda0 00007ff9`e5dc7344     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
00000062`147ffdd0 00007ff9`e7a626b1     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
00000062`147ffe00 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

In another crash WinDBG analyzes a stack overflow in the crash dumps inside of this function:

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ff8bce51e3f (binaryninjacore!BNUpdateHighLevelILOperand+0x000000000008836f)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000052bfeb08
Attempt to read from address 0000000052bfeb08

For me crashing occurs when analyzing ntoskrnl.exe on 22H2. I was able to reproduce it with another binary but I'm not sure if it would be appropriate to upload (please let me know I can also share it privately if this is a concern).

Also please let me know if this is a disturbance to this issue. I can create a new issue or contact support.

xusheng6 commented 5 months ago

@notpidgey please feel free to share with us the file in private, either via binaryninja@vector35.com, or our public slack https://slack.binary.ninja/.

It seems like your issue is very similar to the original one. So please keep replying to this thread until we triage and understand the issue further

xusheng6 commented 5 months ago

Just to provide more information to this -- there was a specific intel i9 13900k/14900k crash in games earlier this year, could that be related? https://www.theverge.com/2024/4/9/24125036/intel-game-crash-13900k-14900k-fortnite-unreal-engine-investigation

Could you please try and see limiting the power consumption, as suggested in various outlets, e.g., https://www.pcgamesn.com/intel/stop-games-crashing-core-i9-unreal-engine, fixes the issue?

notpidgey commented 5 months ago

I have sent over an email with the issue number in the subject.

Another thing I noticed is that this issue is more common if I try to open multiple files at once. Seems to guarantee a crash for me.

I will try these soon: https://www.theverge.com/2024/4/9/24125036/intel-game-crash-13900k-14900k-fortnite-unreal-engine-investigation (did not work) https://www.pcgamesn.com/intel/stop-games-crashing-core-i9-unreal-engine (did not work)

xusheng6 commented 4 months ago

@memN0ps @notpidgey Could you please try and see if the crash also happens in the free version or not? https://binary.ninja/free/

notpidgey commented 4 months ago

I just tried and I was unable to reproduce it on free. Also updated to 4.0.5336 Personal (b4281362) and I still get crashing.

xusheng6 commented 4 months ago

I just tried and I was unable to reproduce it on free.

Thx. The free version has some thread count limit so it probably affects it. If you have other computers, can you test if you can reproduce it with the regular (personal/commercial) version binja? I am still trying to figure out the factors that affect the bug

xusheng6 commented 4 months ago

@notpidgey I still cannot reproduce it with the newly provided binaries. Also, could you please try and see if the issue still happens if you set analysis.limits.workerThreadCount to a lower number?

notpidgey commented 4 months ago

@notpidgey I still cannot reproduce it with the newly provided binaries. Also, could you please try and see if the issue still happens if you set analysis.limits.workerThreadCount to a lower number?

Changing it to 4 seems to work and for reference my default is 31. Tested this multiple times changing it back and forth to confirm.

xusheng6 commented 4 months ago

@notpidgey I still cannot reproduce it with the newly provided binaries. Also, could you please try and see if the issue still happens if you set analysis.limits.workerThreadCount to a lower number?

Changing it to 4 seems to work and for reference my default is 31. Tested this multiple times changing it back and forth to confirm.

Could you try something in between, like 8, 16, or 24?

notpidgey commented 4 months ago

Could you try something in between, like 8, 16, or 24?

8 - No crash 16 - Crash (Took a few tries) 24 - Crash (First try)

If this is an issue with multi threading I'm assuming its possible for it to crash with lower amount of threads but its just less likely. However, I could not get this to happen but I could try more if you would like.

xusheng6 commented 4 months ago

Could you try something in between, like 8, 16, or 24?

8 - No crash 16 - Crash (Took a few tries) 24 - Crash (First try)

If this is an issue with multi threading I'm assuming its possible for it to crash with lower amount of threads but its just less likely. However, I could not get this to happen but I could try more if you would like.

These are very valuable data points! Now I am more inclined to believe this is related to the thread count, and not tied to the specific CPU. Surprisingly I have a Intel i7-13700K which has 24 threads, and I cannot for even once reproduce it. I will try harder and see if I can reproduce it.

I will also look at your crash dumps and see if I can get any clues. In the worst case, if we still cannot figure it out, we may ask you to do a TTD (https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-overview) recording of the crash and share with us a trace of the execution. More on that later.

xusheng6 commented 4 months ago

The user shared multiple crash dumps with us. Below is a representative one (binaryninja.exe.25144.dmp):

0:052> k
 # Child-SP          RetAddr               Call Site
00 000000e5`a10fc818 00007ff9`e529bea0     ntdll!NtWaitForMultipleObjects+0x14
01 000000e5`a10fc820 00007ff9`e529bd9e     KERNELBASE!WaitForMultipleObjectsEx+0xf0
02 000000e5`a10fcb10 00007ff9`e5e1f93a     KERNELBASE!WaitForMultipleObjects+0xe
03 000000e5`a10fcb50 00007ff9`e5e1f376     kernel32!WerpReportFaultInternal+0x58a
04 000000e5`a10fcc70 00007ff9`e536da19     kernel32!WerpReportFault+0xbe
05 000000e5`a10fccb0 00007ff9`e7ab5570     KERNELBASE!UnhandledExceptionFilter+0x3d9
06 000000e5`a10fcdd0 00007ff9`e7a9ca06     ntdll!RtlUserThreadStart$filt$0+0xa2
07 000000e5`a10fce10 00007ff9`e7ab247f     ntdll!_C_specific_handler+0x96
08 000000e5`a10fce80 00007ff9`e7a614b4     ntdll!RtlpExecuteHandlerForException+0xf
09 000000e5`a10fceb0 00007ff9`e7ab0f8e     ntdll!RtlDispatchException+0x244
0a 000000e5`a10fd5c0 00007ff9`de951a74     ntdll!KiUserExceptionDispatch+0x2e
0b 000000e5`a10fdcd8 00007ff8`bc545ce2     VCRUNTIME140!memset+0x1d4 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\string\amd64\memset.asm @ 325] 
0c 000000e5`a10fdce0 000000e5`a10fdd70     binaryninjacore+0x495ce2
0d 000000e5`a10fdce8 00000000`00000008     0x000000e5`a10fdd70
0e 000000e5`a10fdcf0 00000000`00000000     0x8

The user is using Build: 4.0.4958 Personal (ddff9339)

Note, not all of the crash dump starts from binaryninjacore+0x495ce2, though. Some of them are from a different address, and some of them suffer from severe stack corruption and the stack trace is not very useful.

xusheng6 commented 4 months ago

@notpidgey @notpidgey can you monitor the RAM usage when the crash happens? Could it be a OOM?

notpidgey commented 4 months ago

@xusheng6 Doesn't seem to be an OOM issue.

image

notpidgey commented 4 months ago

Is there anything I could do to help? Would a TTD be of any use?

xusheng6 commented 4 months ago

I have asked my colleauges to try to reproduce it, so far no progress yet.

As for a TTD recording, we will be shipping a feature that lets you do a TTD recording within BN directly. Which I think might be helpful because you can make the trace file smaller by starting the recording later on, rather than from the launch of binja. This feature is not originally intended to be used as such -- it is just an coincidence. I will try to update on this no later than next Monday

xusheng6 commented 4 months ago

One of the offending ntoskrnel binary:

ntoskrnl(1).exe.zip

negasora commented 4 months ago

This doesn't repro for me on 4.1.5422-dev, win10, 5950x, 32GB memory, default settings and no plugins

xusheng6 commented 4 months ago

Hi @notpidgey, the TTD recording in binja will be delayed a bit. So I suggest you do a recording with Windbg following their docs: https://learn.microsoft.com/en-us/windows-hardware/drivers/debuggercmds/time-travel-debugging-record. I suggest you do first launch binaryninja.exe, wait for it to become stable, then attach to the process, and start recording. This way, you can avoid doing the tracing during binja initialization. Once you get a trace file, it is probably very large, we have seen trace files of size 20GB or even more. Please upload it to the Internet and share a link with me in private (since the trace file may contain your private info)

xusheng6 commented 4 months ago

After looking at the TTD trace, I am now suspecting that this is a stack overflow. The crash happens in a deep recursive function where there seems no other obvious reasons to crash. The binaryninja.exe we ship has 1MB stack size, which could be exhausted.

@notpidgey would you like to try patching the binaryninja.exe and give it a higher stack size? You can do it using binja itself -- just set the sizeOfStackReserve to a larger value, e.g., to 0x1000000 (16MB), or even larger. Might also wish to try it for sizeOfHeapReserve if it does not help.

image

notpidgey commented 4 months ago

Opening Binja in Binja causes a crash for me as well. Was able to patch it in a different way but even a stack reserve of 256MB I get a crash. I could send another TTD if that helps.

Edit: Modifying heap reserve did not fix the issue either.

xusheng6 commented 4 months ago

@notpidgey yes, a TTD replay after the patch would be helpful!

xusheng6 commented 4 months ago

The user reproduced the bug with a debugger attached to BN:

EXCEPTION_DEBUG_INFO:
           dwFirstChance: 0
           ExceptionCode: C0000409 (STATUS_STACK_BUFFER_OVERRUN)
          ExceptionFlags: 00000001
        ExceptionAddress: binaryninjacore.00007FF970B929C1
        NumberParameters: 1
ExceptionInformation[00]: 0000000000000002
Last chance exception on 00007FF970B929C1 (C0000409, STATUS_STACK_BUFFER_OVERRUN)!
xusheng6 commented 4 months ago

Another user with i9-13900K and 32GB of RAM running windows helped test this and got no crash

psifertex commented 4 months ago

All evidence continues to point to this being a specific bug with the latest Intel hardware. We'll leave this open for now and re-visit it once Intel releases a potential fix.

riskydissonance commented 4 months ago

Just adding that I also faced this crash on 4.1.5560-dev with Win11 i9-14900k, 64GB ram when re-analysing an old .bndb, and reducing the worker thread count to 4 as instructed above allowed the .bndb to be analysed fully without the crash.

The stack trace was wildly different every crash, @glenns has more details, thanks to him for all the help.

riskydissonance commented 4 months ago

FYI I was also experiencing crashes with one particular game, a thread suggested updating my BIOS which has fixed both the game and the loading of the .bndb that was crashing mentioned above. This is back on a worker thread count of 31 with 4.1.5575-dev.

notpidgey commented 4 months ago

I ordered a replacement processor yesterday so this is actually some unfortunate timing...

I updated my BIOS as well and it seems to have worked. Along with the BIOS update a lot of my motherboard default settings got changed. I'm sure its hardly relevant but here is the list in case someone runs into this issue.

BCLK Frequency 100.00 -> Auto
DRAM Frequency Auto -> DDR5-6000MHz
DRAM CAS Latency Auto -> 36
DRAM RAS to CAS Delay Read Auto -> 36
DRAM RAS to CAS Delay Write Auto -> 36
DRAM RAS PRE Time Auto -> 36
DRAM RAS ACT Time Auto -> 96
DRAM VDD Voltage Auto -> 1.35
DRAM VDDQ Voltage Auto -> 1.35
psifertex commented 4 months ago

Thanks for the updates, glad it's resolved! We'll add a note to our troubleshooting documentation shortly.