Invalid Ptr read in blake3 causing the master node to crash

0vercl0k / wtf

wtf is a distributed, code-coverage guided, customizable, cross-platform snapshot-based fuzzer designed for attacking user and / or kernel-mode targets running on Microsoft Windows and Linux user-mode (experimental!).

MIT License

1.47k stars 132 forks source link

Invalid Ptr read in blake3 causing the master node to crash #207

Closed 0xDivyanshu-new closed 4 months ago

0xDivyanshu-new commented 4 months ago

I am trying to fuzz my target but its crashing on a invalid ptr read inside blake. Stack information :-

(acc.1a60): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
wtf!blake3_compress_in_place_sse41+0x39:
00007ff7`ea5036f4 0f28142500e0a7ea movaps  xmm2,xmmword ptr [0FFFFFFFFEAA7E000h] ds:ffffffff`eaa7e000=????????????????????????????????
0:000> kb
 # RetAddr               : Args to Child                                                           : Call Site
00 00000000`5f000000     : 00000000`00000000 00000000`00000000 00000000`00000000 3f000000`00000000 : wtf!blake3_compress_in_place_sse41+0x39
01 00000000`00000000     : 00000000`00000000 00000000`00000000 3f000000`00000000 00000000`00000000 : 0x5f000000

I am not quite sure why but i ran wtf with a single testcase and it worked just fine, the issue comes up only when I try to run both the master node and worker node.

OS : Windows 10

Snapshot is correct since it already worked with a single testcase. Snapshot target is a windows machine. I am just running Target_t xxxx("xxxx", Init, InsertTestcase); //, HonggfuzzMutator_t::Create); without any mutator for time being.

When i run the worker and master together, the master crashes first due to above error after the client has ran the testcase. I get this output from client which indicates the testcase ran.

Dialing to tcp://localhost:31337/..
[+] InsertTestCase Size: 210
[+] Stopping
#1 cov: 12189 exec/s: 0.0 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0 uptime: 13.0s

Just after this master node crashes and the client throws -1 on recv() call and terminates.

Not sure what all more info you might need. Lmk if you need anything more.

0vercl0k commented 4 months ago

This stack trace is bizarre as there's no frames; is it the client or the server that generates that crash?

Cheers

On Sun, Jul 7, 2024 at 9:33 AM Divyanshu | seg_fault < @.***> wrote:

I am trying to fuzz my target but its crashing on a invalid ptr read inside blake. Stack information :-

(acc.1a60): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. wtf!blake3_compress_in_place_sse41+0x39: 00007ff7ea5036f4 0f28142500e0a7ea movaps xmm2,xmmword ptr [0FFFFFFFFEAA7E000h] ds:ffffffffeaa7e000=???????????????????????????????? 0:000> kb

RetAddr : Args to Child : Call Site

00 000000005f000000 : 0000000000000000 0000000000000000 0000000000000000 3f00000000000000 : wtf!blake3_compress_in_place_sse41+0x39 01 0000000000000000 : 0000000000000000 0000000000000000 3f00000000000000 0000000000000000 : 0x5f000000

I am not quite sure why but i ran wtf with a single testcase and it worked just fine, the issue comes up only when I try to run both the master node and worker node.

OS : Windows 10

Snapshot is correct since it already worked with a single testcase. Snapshot target is a windows machine. I am just running Target_t xxxx("xxxx", Init, InsertTestcase); //, HonggfuzzMutator_t::Create); without any mutator for time being.

When i run the worker and master together, the master crashes first due to above error after the client has ran the testcase. I get this output from client which indicates the testcase ran.

Dialing to tcp://localhost:31337/.. [+] InsertTestCase Size: 210 [+] Stopping

1 cov: 12189 exec/s: 0.0 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0 uptime: 13.0s

Just after this master node crashes and the client throws -1 on recv() call and terminates.

Not sure what all more info you might need. Lmk if you need anything more.

— Reply to this email directly, view it on GitHub https://github.com/0vercl0k/wtf/issues/207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIORKKWKWRLX3J3OV3CA3ZLFUW3AVCNFSM6AAAAABKPPY226VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TIMJSGQYTEMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

0vercl0k commented 4 months ago

Okay sorry you did mention that it is the master node in the title of the issue; I missed that, lemme check some of the code..

On Sun, Jul 7, 2024 at 9:42 AM Axel S @.***> wrote:

This stack trace is bizarre as there's no frames; is it the client or the server that generates that crash?

Cheers

On Sun, Jul 7, 2024 at 9:33 AM Divyanshu | seg_fault < @.***> wrote:

I am trying to fuzz my target but its crashing on a invalid ptr read inside blake. Stack information :-

(acc.1a60): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. wtf!blake3_compress_in_place_sse41+0x39: 00007ff7ea5036f4 0f28142500e0a7ea movaps xmm2,xmmword ptr [0FFFFFFFFEAA7E000h] ds:ffffffffeaa7e000=???????????????????????????????? 0:000> kb

RetAddr : Args to Child : Call Site

00 000000005f000000 : 0000000000000000 0000000000000000 0000000000000000 3f00000000000000 : wtf!blake3_compress_in_place_sse41+0x39 01 0000000000000000 : 0000000000000000 0000000000000000 3f00000000000000 0000000000000000 : 0x5f000000

I am not quite sure why but i ran wtf with a single testcase and it worked just fine, the issue comes up only when I try to run both the master node and worker node.

OS : Windows 10

Snapshot is correct since it already worked with a single testcase. Snapshot target is a windows machine. I am just running Target_t xxxx("xxxx", Init, InsertTestcase); //, HonggfuzzMutator_t::Create); without any mutator for time being.

When i run the worker and master together, the master crashes first due to above error after the client has ran the testcase. I get this output from client which indicates the testcase ran.

Dialing to tcp://localhost:31337/.. [+] InsertTestCase Size: 210 [+] Stopping

1 cov: 12189 exec/s: 0.0 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0 uptime: 13.0s

Just after this master node crashes and the client throws -1 on recv() call and terminates.

Not sure what all more info you might need. Lmk if you need anything more.

— Reply to this email directly, view it on GitHub https://github.com/0vercl0k/wtf/issues/207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIORKKWKWRLX3J3OV3CA3ZLFUW3AVCNFSM6AAAAABKPPY226VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TIMJSGQYTEMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

0vercl0k commented 4 months ago

In the meantime, I would recommend for you to set-up a repro, and attach a debugger to the master node and set a breakpoint on Server_t::HandleNewResult and start stepping into the code from here to see what's going on and gather more context.

On Sun, Jul 7, 2024 at 9:53 AM Axel S @.***> wrote:

Okay sorry you did mention that it is the master node in the title of the issue; I missed that, lemme check some of the code..

On Sun, Jul 7, 2024 at 9:42 AM Axel S @.***> wrote:

This stack trace is bizarre as there's no frames; is it the client or the server that generates that crash?

Cheers

On Sun, Jul 7, 2024 at 9:33 AM Divyanshu | seg_fault < @.***> wrote:

I am trying to fuzz my target but its crashing on a invalid ptr read inside blake. Stack information :-

(acc.1a60): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. wtf!blake3_compress_in_place_sse41+0x39: 00007ff7ea5036f4 0f28142500e0a7ea movaps xmm2,xmmword ptr [0FFFFFFFFEAA7E000h] ds:ffffffffeaa7e000=???????????????????????????????? 0:000> kb

RetAddr : Args to Child : Call Site

00 000000005f000000 : 0000000000000000 0000000000000000 0000000000000000 3f00000000000000 : wtf!blake3_compress_in_place_sse41+0x39 01 0000000000000000 : 0000000000000000 0000000000000000 3f00000000000000 0000000000000000 : 0x5f000000

I am not quite sure why but i ran wtf with a single testcase and it worked just fine, the issue comes up only when I try to run both the master node and worker node.

OS : Windows 10

Snapshot is correct since it already worked with a single testcase. Snapshot target is a windows machine. I am just running Target_t xxxx("xxxx", Init, InsertTestcase); //, HonggfuzzMutator_t::Create); without any mutator for time being.

When i run the worker and master together, the master crashes first due to above error after the client has ran the testcase. I get this output from client which indicates the testcase ran.

Dialing to tcp://localhost:31337/.. [+] InsertTestCase Size: 210 [+] Stopping

1 cov: 12189 exec/s: 0.0 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0 uptime: 13.0s

Just after this master node crashes and the client throws -1 on recv() call and terminates.

Not sure what all more info you might need. Lmk if you need anything more.

— Reply to this email directly, view it on GitHub https://github.com/0vercl0k/wtf/issues/207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIORKKWKWRLX3J3OV3CA3ZLFUW3AVCNFSM6AAAAABKPPY226VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TIMJSGQYTEMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

0vercl0k commented 4 months ago

The only place I think blake3 is used in the master node, is in Corpus_t::SaveTestcase; you could also break there. This feels like maybe you are receiving a corrupted testcase maybe? Also turns on pageheap on wtf.exe; there might be some memory corruption happening.

Anyways, it'd be great if you could break on that function and see if anything looks weird and report back because I don't have much information to chase down what's happening.

0xDivyanshu-new commented 4 months ago

So its crashing inside Blake3HexDigest function inside SaveTestcase function. The Blake3HexDigest function contains blake3_hasher_update where the crash happens.

the initialization works but crashes inside hasher updater

0xDivyanshu-new commented 4 months ago

So, it seems like maybe the Data argument to update is messed up for some reason. DataSize seems fine!

0:000> dps @rcx
0000009c`fa6fad20  bb67ae85`6a09e667
0000009c`fa6fad28  a54ff53a`3c6ef372
0000009c`fa6fad30  9b05688c`510e527f
0000009c`fa6fad38  5be0cd19`1f83d9ab
0000009c`fa6fad40  bb67ae85`6a09e667
0000009c`fa6fad48  a54ff53a`3c6ef372
0000009c`fa6fad50  9b05688c`510e527f
0000009c`fa6fad58  5be0cd19`1f83d9ab
0000009c`fa6fad60  00000000`00000000
0000009c`fa6fad68  00000000`00000000
0000009c`fa6fad70  00000000`00000000
0000009c`fa6fad78  00000000`00000000
0000009c`fa6fad80  00000000`00000000
0000009c`fa6fad88  00000000`00000000
0000009c`fa6fad90  00000000`00000000
0000009c`fa6fad98  00000000`00000000
0:000> dps @rdx
00000275`88486f20  ffffffff`00000001
00000275`88486f28  00000100`00000300
00000275`88486f30  00000100`00000000
00000275`88486f38  000000ca`dd7f0000
00000275`88486f40  00010000`00010000
00000275`88486f48  58010000`00010000
00000275`88486f50  01010000`00000b2e
00000275`88486f58  01000000`02000000
00000275`88486f60  010b2e58`01000000
00000275`88486f68  00000003`00000000
00000275`88486f70  48f1f6ac`48fc9f51
00000275`88486f78  6df746ac`283b588b
00000275`88486f80  00000001`00000000
00000275`88486f88  00000000`00000002
00000275`88486f90  6d690000`000a0000
00000275`88486f98  588c4ac9`2fe61feb
0:000> dps @r8 L2
00000000`000000d2  ????????`????????

0xDivyanshu-new commented 4 months ago

This seems to be like a blake bug. The error seems to be related the asm code of blake3_sse41_x86-64_windows

0xDivyanshu-new commented 4 months ago

The BLAKE3_IV seems to be the arbitary ptr.

blake3_compress_in_place_sse41 PROC
_blake3_compress_in_place_sse41 PROC
        sub     rsp, 120
        movdqa  xmmword ptr [rsp], xmm6
        movdqa  xmmword ptr [rsp+10H], xmm7
        movdqa  xmmword ptr [rsp+20H], xmm8
        movdqa  xmmword ptr [rsp+30H], xmm9
        movdqa  xmmword ptr [rsp+40H], xmm11
        movdqa  xmmword ptr [rsp+50H], xmm14
        movdqa  xmmword ptr [rsp+60H], xmm15
        movups  xmm0, xmmword ptr [rcx]
        movups  xmm1, xmmword ptr [rcx+10H]
        movaps  xmm2, xmmword ptr [BLAKE3_IV]

Snippet from blake3_sse41_x86-64

0vercl0k commented 4 months ago

Hmmm this is bizarre, it feels this might be a compilation problem or something? Can you try to do a clean build?

If I disassemble this function in the latest published wtf.exe binary; this is what I have:

0:000> u wtf!blake3_compress_in_place_sse41+0x39
wtf!blake3_compress_in_place_sse41+0x39:
00000001`401c3dfb 0f2815fe921800  movaps  xmm2,xmmword ptr [wtf!__xt_z+0xc0 (00000001`4034d100)]
00000001`401c3e02 0fb68424a0000000 movzx   eax,byte ptr [rsp+0A0h]
00000001`401c3e0a 450fb6c0        movzx   r8d,r8b
00000001`401c3e0e 48c1e020        shl     rax,20h
00000001`401c3e12 4c03c0          add     r8,rax
00000001`401c3e15 66490f6ed9      movq    xmm3,r9
00000001`401c3e1a 66490f6ee0      movq    xmm4,r8
00000001`401c3e1f 660f6cdc        punpcklqdq xmm3,xmm4

0:000> dqs 00000001`4034d100
00000001`4034d100  bb67ae85`6a09e667
00000001`4034d108  a54ff53a`3c6ef372

0xDivyanshu-new commented 4 months ago

Alright, this was some unknown weird issue while compiling with clang-cl. Instead compiled with cl and it works. It was some weird compilation issue!

Closing this issue!

0vercl0k commented 4 months ago

Cool, I'm glad this is figured out!

Cheers