dusklang / dusk

The Dusk Programming Language
Other
0 stars 0 forks source link

Refinement Checker Part I #90

Closed zachwolfe closed 3 years ago

zachwolfe commented 3 years ago

New approach

Old approach

# Goals for this pull request
- [x] Print a proof when we fail.
- [x] Try switching from forall quantifiers to proving the negation. See "Satisfiability and Validity" [here](https://rise4fun.com/z3/tutorial#:~:text=The%20command%20check%2Dsat%20determines,formula%20is%20satisfiable%20or%20not.).
- [x] Parse pre- and post-condition attributes on functions
- [x] Convert the condition expressions to refinement Constraints
- Basic tracking of constraints and overflow detection on bare integer values
  - [x] Preconditions
  - [x] Postconditions
  - [x] Check pre/post-conditions on non-intrinsic function calls
  ~~- [ ] Add pre/post-conditions to intrinsic functions the same way they're added to non-intrinsic functions~~
    - Edit: nah, intrinsics use a separate Instr variant from function calls, so I don't think this is worth it.
- Tracking of pointers
  - [x] For malloc and alloca, add a constraint to the pointer value with its origin, consisting of the original pointer value, allocation size, etc.
  - [x] When casting a pointer to an integer, attach the origin info to the cast value.
  - [x] When casting a pointer to another pointer type, validate its origin information (i.e., that the destination pointee type fits in the allocation, and just generally that the derived pointer is not out of the bounds of the original allocation)
  - [x] When casting a pointer to an integer, add the constraints that 0 <= the int <= USIZE_MAX - len + offset
  - When casting an integer to a pointer, validate its origin information
    - [x] The easy case: the user cast directly from a pointer to an integer, and back to a pointer. The pointer will have direct origin information
    - The harder (and more useful) case: the user manipulated the integer before casting back to a pointer. There are two options I can think of here. This first is, let's say we add 1 to a pointer-derived integer. Then, we can detect that and compute an origin for the integer, using the original origins and the constraints on the sum. The second is, I can walk backward through the constraints on the integer, trying to find a pointer origin somewhere. The first option seems simpler.
      - [x] Addition
      - [x] Subtraction
      - [ ] Subtraction of two ints that come from comparable pointers (i.e. from the same allocation)
        ~~- [ ] Negation~~
          - Edit: no, that obviously doesn't make any sense
- Basic pointer typestate
  - [x] Track the currently-stored value (as in OpId) in the pointer, and pass its constraints on through a load
  - [x] Track a Vec of possibly-aliased ranges for each pointer, where each range has a start offset, a length, and a kind (either a write with the written OpId, or an opaque "unknown" value, used for statics). Items later in the Vec take precedence.
    - Advice: This is pretty complicated! You should be conservative and only handle the cases you’re sure make sense.
    - Idea: Allow a new write to the pointer if and only if it fully aliases (covers the full range of) zero or more prior writes, AND it does not partially alias any other prior writes.
  - [x] Use the Vec of possibly-aliased ranges in reads and writes.
- Structs
  - [ ] Handle struct literals and direct field accesses, without losing information about constraints, typestate or origin
  - [ ] Handle indirect struct field accesses without losing information
  - [ ] Allow (and correctly handle) struct field accesses in pre/postconditions
- Mutable values in interface
  - [ ] Track pointer typestate for static variables (recall: the problem is each access to the same static is seen as an entirely independent AddressOfStatic instruction; we need a way to connect these up to the same source of truth)
  - [ ] Allow constraining static variables in pre/postconditions
  - [ ] Allow constraining dereferences of pointer parameters in pre/postconditions (e.g. `@postcondition(*b > a)`)
  - [ ] Allow referring to the old version of a mutable value in postconditions (either pointer deref or static var)
    - Example: `@postcondition(*a == old(*a) + 2)`
  - [ ] Don't allow writing to a pointer unless it originated in the current function (by malloc or alloca), or the value at the pointer has at least one postcondition associated with it
  - [ ] When calling a function, eliminate and replace any relevant pointer typestate with the postconditions of the function
- Lifetime analysis
  - [ ] For malloc'ed pointers, track the known live state of the pointer (through calls to free, that is), and disallow use-after-free (including double-free).
  - [ ] Disallow calls to free for non-malloc'ed pointers and malloc'ed pointers with nonzero offsets
- Loop-free branches
  - [ ] Unconditional branching: pretty simple
  - [ ] Conditional branching: a bit more complex, but doable
- [ ] Get rid of printing on refinement failure
- I have no idea how to handle loops or recursion yet, so they are **out of scope** for this pull request
- Also out of scope is any kind of borrow checker, where origins of pointers are tracked across function boundaries
zachwolfe commented 3 years ago

loop sample mir (names cleaned up):

fn len(%str: u8*): usize {
%bbSTART:
    %len_ptr = alloca usize
    %zero = 0 as usize
    store %zero in %len_ptr
    br %bbCHECK
%bbCHECK:
    %str_addr = reinterpret %str as usize
    %len = load %len_ptr
    %cur_char_addr = intrinsic `+`(%str_addr, %len)
    %cur_char_ptr = reinterpret %cur_char_addr as u8*
    %cur_char = load %cur_char_ptr
    %zero.1 = 0 as u8
    %condition = intrinsic `!=`(%cur_char, %zero.1)
    condbr %condition, %bbBODY, %bbEND
%bbBODY:
    %len.1 = load %len_ptr
    %one = 1 as usize
    %new_len = intrinsic `+`(%len.1, %one)
    store %new_len in %len_ptr
    br %bbCHECK
%bbEND:
    %len.2 = load %len_ptr
    return %len.2
}