GrapheneOS / hardened_malloc

Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.
https://grapheneos.org/
MIT License
1.21k stars 91 forks source link

Add guarded memcpy #231

Open jvoisin opened 7 months ago

jvoisin commented 7 months ago

snmalloc has a cool low overhead check in memcpy to detect cross-zones copies, is this something that we might want to add to hardened_malloc?

thestinger commented 7 months ago

It's already supported via the malloc_object_size APi and it's up to libc if it wants to use it.

jvoisin commented 7 months ago

snmalloc provides it natively, without having to have the libc support it.

qua3k commented 3 months ago

Would such a mitigation prevent any more attacks than the slab canaries already do?

jvoisin commented 3 months ago

This would catch corruption before it happens, while slab canaries are only detecting it maybe at some point.

But with MTE, it's kinda useless I think.

thestinger commented 3 months ago

@jvoisin Yeah, it's useless with MTE, but we support platforms without MTE so it would make sense to address as a low priority feature.

jvoisin commented 3 months ago

It adds a fair share a complexity:

memcpy_guarded>:
    mov    rax,QWORD PTR [rip+0xbfa]        # Load Chunk map base
    test   rax,rax                          # Check if chunk map is initialised
    je     DONE                             #  |
    mov    rcx,rdi                          # Get chunk map entry
    shr    rcx,0xa                          #  |
    and    rcx,0xfffffffffffffff0           #  |
    mov    rax,QWORD PTR [rax+rcx*1+0x8]    # Load sizeclass
    and    eax,0x7f                         #  |
    shl    rax,0x5                          #  |
    lea    r8,[sizeclass_meta_data]         #  |
    mov    rcx,QWORD PTR [rax+r8*1]         # Load object size
    mov    r9,QWORD PTR [rax+r8*1+0x8]      # Load slab mask
    and    r9,rdi                           # Offset within slab
    mov    rax,QWORD PTR [rax+r8*1+0x10]    # Load modulus constant
    imul   rax,r9                           # Perform recripocal modulus
    shr    rax,0x36                         #  |
    imul   rax,rcx                          #  |
    sub    rcx,r9                           # Find distance to end of object.
    add    rcx,rax                          #  |
    cmp    rax,rdx                          # Compare to length of memcpy.
    jb     ERROR                            #  |
DONE:
    jmp    <memcpy>
ERROR:
    ud2                                     # Trap

I'm not sure it's worth the hassle.

SkewedZeppelin commented 2 months ago

Here are some Android examples that this could benefit: A-194105348 CVE-2021-39623 EoP/High A-246932269 CVE-2023-20945 EoP/High A-258652631 CVE-2023-20951 RCE/Critical A-261867748 CVE-2023-20954 RCE/Critical A-275418191 CVE-2023-21127 RCE/Critical A-276898739 CVE-2024-0030 ID/High

thestinger commented 2 months ago

There are guaranteed guard pages around everything above 16k with the default guard slab internal of 1, so that does provide protection against a lot of attacks even without memory tagging. We did used to implement a feature like this for a lot of libc functions but that has been gone for years now. The infrastructure for it is partially there is hardened_malloc already via the object size functions.