angr / simuvex

[DEPRECATED] A symbolic execution engine for the VEX IR
BSD 2-Clause "Simplified" License
79 stars 57 forks source link

snprintf SimProcedure #23

Open esanfelix opened 8 years ago

esanfelix commented 8 years ago

I'm not sure this should belong here as "an issue" or you'd rather receive these through other channels (si there a mailing list?) so let me know if that's the case.

The snprintf SimProcedure is currently missing (and vsnprintf kinda unimplemented as well, with a simple return size-1). I've been playing around with some binaries that use snprintf... and tried to implement it based off sprintf.

However, I do not know exactly what's the "correct" way to constrain the size of the output string. sprintf does this:

        fmt_str = self._parse(1)
        out_str = fmt_str.replace(2, self.arg)
        self.state.memory.store(dst_ptr, out_str)

So I can easily increase the indices passed to self._parse and fmt_str.replace. However, I am in doubt on what would be the proper way to constrain the buffer size...

rhelmot commented 8 years ago

This is a bit of a general problem in symbolic execution - when you run across symbolic data-size fields, and you need to put some concrete number of bytes into memory, how do you handle it? In several places in angr we make the choice that we should concretize the size to the maximum of the possible values it could represent (state.se.max_int, I think), but you have to put an additional check on top of that that you aren't about to ask for four gigabytes of data to be written to memory.

For snprintf and friends, you might want to just take out_str, and then if it's too long (according to whatever metric by which you think it's appropriate to concretize the n parameter), truncate it before storing it to memory.

Also, yes there is a mailing list! It's listed as the contact email for the angr organization on github :+)

iamahuman commented 7 years ago

@esanfelix You mean something like this?

class snprintf(FormatParser):
    #pylint:disable=arguments-differ

    def run(self, dst_ptr, size, fmt):
        # The format str is at index 2
        fmt_str = self._parse(2)
        out_str = fmt_str.replace(3, self.arg)
        orig_size = out_str.size()
        max_wsz = self.state.se.max_int(size)
        act_len = orig_size // 8

        if max_wsz > 0:
            # cut the out_str to maximum possible write size
            if orig_size > (max_wsz - 1) * 8:
                out_str = out_str.get_bytes(0, max_wsz - 1)
            req_len = size - 1
            written_len = self.state.se.If(req_len < act_len, req_len, act_len)

            # place the string
            self.state.memory.store(dst_ptr, out_str, size=written_len, condition=size > 0)

            # place the terminating null byte
            self.state.memory.store(dst_ptr + written_len, self.state.se.BVV(0, 8), condition=size > 0)

        # size_t has size arch.bits
        return self.state.se.BVV(act_len, self.state.arch.bits)