GaloisInc / macaw

Open source binary analysis tools.
BSD 3-Clause "New" or "Revised" License
199 stars 20 forks source link

Inconsistent classification of syscalls with respect to side effects #400

Open RyanGlScott opened 1 month ago

RyanGlScott commented 1 month ago

archFnHasSideEffects classifies whether a primitive function has implicit side effects or not, with each architecture having different implementations. I was surprised to learn that each architecture classifies system calls differently vis-à-vis side effects. macaw-x86 classifies syscalls as having side effects:

https://github.com/GaloisInc/macaw/blob/83d3907054181b17be365bdea9abb25edd48aae3/x86/src/Data/Macaw/X86/ArchTypes.hs#L1082

But neither AArch32 nor RISC-V do:

https://github.com/GaloisInc/macaw/blob/83d3907054181b17be365bdea9abb25edd48aae3/macaw-aarch32/src/Data/Macaw/ARM/Arch.hs#L554-L556 https://github.com/GaloisInc/macaw/blob/83d3907054181b17be365bdea9abb25edd48aae3/macaw-riscv/src/Data/Macaw/RISCV/Arch.hs#L137-L139

This feels inconsistent. I would think that system calls should be treated as having side effects, although I don't have a concrete example where classifying them as side-effect–free would cause them to go wrong.

sauclovian-g commented 1 month ago

It depends to some extent on which things you consider side effects, which is sort of a fraught question (e.g. is the effect of kill(getpid(), SIGSEGV) a side effect or just a program step?) but in general yes they do, and while going wrong is probably not that likely in an environment that isn't trying to model the system/kernel state, consider unlinking a file (which can cause a subsequent repeated syscall to behave differently) or, more extreme, forking a subprocess that updates a shared memory region.

RyanGlScott commented 1 month ago

From Macaw's perspective, "side effectful" means "should always be considered as being demanded when performing CFG dependency analysis" (see this code). To add a further wrinkle, Macaw doesn't encode the semantics of any system calls directly, but instead provides a generic system call hook that lets users attach whatever semantics they want to any system call. As such, I think Macaw must be conservative here and assume that the side effects must be extreme enough to warrant returning True here.