danleh / wasabi

A dynamic analysis framework for WebAssembly programs.
http://wasabi.software-lab.org
MIT License
363 stars 47 forks source link

Missing {end_function} call #8

Closed ghost closed 5 years ago

ghost commented 5 years ago

In the following case, Wasabi misses to place the {end_function} instruction call (call 551). I would expect that before every return statement, one is missing.

Looking at the ends of nested if statements, it seems as if there may be some more misplaced instructions. However, I do not understand this in detail so it might as well be wrong about that.

Original code:

  (func (;787;) (type 8) (param i32 i32)
    (local i32 i32 i32 i32)
    block  ;; label = @1
      block  ;; label = @2
        local.get 0
        i32.load
        local.tee 3
        if  ;; label = @3
          local.get 1
          i32.const 15
          i32.gt_u
          local.set 5
          local.get 0
          i32.const 4
          i32.add
          local.tee 2
          i32.load
          i32.eqz
          if  ;; label = @4
            local.get 5
            if  ;; label = @5
              local.get 0
              local.get 1
              i32.const 1
              i32.add
              i32.const 16
              local.get 0
              i32.load offset=24
              i32.const 0
              i32.const 1783927
              i32.const 305
              call 2729
              local.tee 2
              i32.store
              i32.const 0
              local.set 4
              br 3 (;@2;)
            else
              local.get 0
              i32.const 0
              i32.store
              i32.const 0
              local.set 4
              i32.const 0
              local.set 2
              br 3 (;@2;)
            end
          end
          local.get 5
          i32.eqz
          if  ;; label = @4
            local.get 0
            i32.const 0
            i32.store
            i32.const 1
            local.set 4
            i32.const 0
            local.set 2
            br 2 (;@2;)
          end
          local.get 0
          local.get 3
          local.get 1
          i32.const 1
          i32.add
          i32.const 16
          local.get 0
          i32.load offset=24
          i32.const 0
          i32.const 1783927
          i32.const 285
          call 2832
          i32.store
          local.get 2
          local.get 1
          i32.store
          return
        else
          local.get 1
          i32.const 16
          i32.lt_u
          if  ;; label = @4
            return
          else
            local.get 0
            local.get 1
            i32.const 1
            i32.add
            i32.const 16
            local.get 0
            i32.load offset=24
            i32.const 0
            i32.const 1783927
            i32.const 276
            call 2729
            local.tee 2
            i32.store
            i32.const 0
            local.set 4
            local.get 0
            i32.const 4
            i32.add
            local.set 3
          end
        end
      end
      local.get 0
      i32.const 4
      i32.add
      local.set 5
      local.get 2
      if (result i32)  ;; label = @2
        local.get 2
      else
        local.get 5
      end
      local.get 3
      local.get 0
      i32.load offset=20
      i32.const 1
      i32.add
      call 34059
      drop
      local.get 4
      if  ;; label = @2
        local.get 3
        local.get 0
        i32.load offset=24
        call 2687
      end
      local.get 0
      i32.load
      i32.eqz
      if  ;; label = @2
        return
      end
      local.get 0
      local.get 1
      i32.store offset=4
    end)

Instructed code (--hooks=begin,end, 11 additional imports):

  (func (;798;) (type 29) (param i32 i32)
    (local i32 i32 i32 i32)
    i32.const 787
    i32.const -1
    call 548
    block  ;; label = @1
      i32.const 787
      i32.const 0
      call 549
      block  ;; label = @2
        i32.const 787
        i32.const 1
        call 549
        local.get 0
        i32.load
        local.tee 3
        if  ;; label = @3
          i32.const 787
          i32.const 5
          call 550
          local.get 1
          i32.const 15
          i32.gt_u
          local.set 5
          local.get 0
          i32.const 4
          i32.add
          local.tee 2
          i32.load
          i32.eqz
          if  ;; label = @4
            i32.const 787
            i32.const 16
            call 550
            local.get 5
            if  ;; label = @5
              i32.const 787
              i32.const 18
              call 550
              local.get 0
              local.get 1
              i32.const 1
              i32.add
              i32.const 16
              local.get 0
              i32.load offset=24
              i32.const 0
              i32.const 1783927
              i32.const 305
              call 2740
              local.tee 2
              i32.store
              i32.const 0
              local.set 4
              i32.const 787
              i32.const 44
              i32.const 18
              call 552
              i32.const 787
              i32.const 45
              i32.const 16
              call 552
              i32.const 787
              i32.const 102
              i32.const 5
              call 552
              i32.const 787
              i32.const 103
              i32.const 1
              call 554
              br 3 (;@2;)
            else
              local.get 0
              i32.const 0
              i32.store
              i32.const 0
              local.set 4
              i32.const 0
              local.set 2
              br 3 (;@2;)
              i32.const 787
              i32.const 44
              i32.const 18
              call 552
            end
            i32.const 787
            i32.const 45
            i32.const 16
            call 552
          end
          local.get 5
          i32.eqz
          if  ;; label = @4
            i32.const 787
            i32.const 48
            call 550
            local.get 0
            i32.const 0
            i32.store
            i32.const 1
            local.set 4
            i32.const 0
            local.set 2
            i32.const 787
            i32.const 57
            i32.const 48
            call 552
            i32.const 787
            i32.const 102
            i32.const 5
            call 552
            i32.const 787
            i32.const 103
            i32.const 1
            call 554
            br 2 (;@2;)
            i32.const 787
            i32.const 57
            i32.const 48
            call 552
          end
          local.get 0
          local.get 3
          local.get 1
          i32.const 1
          i32.add
          i32.const 16
          local.get 0
          i32.load offset=24
          i32.const 0
          i32.const 1783927
          i32.const 285
          call 2843
          i32.store
          local.get 2
          local.get 1
          i32.store
          i32.const 787
          i32.const 102
          i32.const 5
          call 552
          i32.const 787
          i32.const 103
          i32.const 1
          call 554
          i32.const 787
          i32.const 137
          i32.const 0
          call 554
          i32.const 787
          i32.const 138
          call 551
          return
        else
          local.get 1
          i32.const 16
          i32.lt_u
          if  ;; label = @4
            return
          else
            local.get 0
            local.get 1
            i32.const 1
            i32.add
            i32.const 16
            local.get 0
            i32.load offset=24
            i32.const 0
            i32.const 1783927
            i32.const 276
            call 2740
            local.tee 2
            i32.store
            i32.const 0
            local.set 4
            local.get 0
            i32.const 4
            i32.add
            local.set 3
          end
          i32.const 787
          i32.const 102
          i32.const 5
          call 552
        end
        i32.const 787
        i32.const 103
        i32.const 1
        call 554
      end
      local.get 0
      i32.const 4
      i32.add
      local.set 5
      local.get 2
      if (result i32)  ;; label = @2
        i32.const 787
        i32.const 109
        call 550
        local.get 2
        i32.const 787
        i32.const 111
        i32.const 109
        call 552
      else
        i32.const 787
        i32.const 111
        i32.const 109
        call 555
        local.get 5
        i32.const 787
        i32.const 113
        i32.const 111
        i32.const 109
        call 556
      end
      local.get 3
      local.get 0
      i32.load offset=20
      i32.const 1
      i32.add
      call 34070
      drop
      local.get 4
      if  ;; label = @2
        i32.const 787
        i32.const 122
        call 550
        local.get 3
        local.get 0
        i32.load offset=24
        call 2698
        i32.const 787
        i32.const 127
        i32.const 122
        call 552
      end
      local.get 0
      i32.load
      i32.eqz
      if  ;; label = @2
        i32.const 787
        i32.const 131
        call 550
        i32.const 787
        i32.const 133
        i32.const 131
        call 552
        i32.const 787
        i32.const 137
        i32.const 0
        call 554
        i32.const 787
        i32.const 138
        call 551
        return
        i32.const 787
        i32.const 133
        i32.const 131
        call 552
      end
      local.get 0
      local.get 1
      i32.store offset=4
      i32.const 787
      i32.const 137
      i32.const 0
      call 554
    end
    i32.const 787
    i32.const 138
    call 551)

Wasabi imports:

 - func[548] sig=29 <begin_function> <- __wasabi_hooks.begin_function
 - func[549] sig=29 <begin_block> <- __wasabi_hooks.begin_block
 - func[550] sig=29 <begin_if> <- __wasabi_hooks.begin_if
 - func[551] sig=29 <end_function> <- __wasabi_hooks.end_function
 - func[552] sig=20 <end_if> <- __wasabi_hooks.end_if
 - func[553] sig=29 <begin_loop> <- __wasabi_hooks.begin_loop
 - func[554] sig=20 <end_block> <- __wasabi_hooks.end_block
 - func[555] sig=20 <begin_else> <- __wasabi_hooks.begin_else
 - func[556] sig=28 <end_else> <- __wasabi_hooks.end_else
 - func[557] sig=20 <end_loop> <- __wasabi_hooks.end_loop
 - func[558] sig=28 <br_table> <- __wasabi_hooks.br_table
danleh commented 5 years ago

Mh, interesting find, thanks!

  1. There is also a return hook, which should be inserted before every return instruction. Maybe that is a quick workaround for the missing end hook for your use case? What is your use case?

  2. The difference and the exact semantics for end hook vs. return hook is not documented and unfortunately not even 100% clear to me. I think they somewhat overlap in functionality (i.e. both should be called on return instructions), but one is rather for "block scoping" (end hook, e.g., "When does the execution leave blocks?") vs. "What values does the function return?" (return hook).

  3. I do not really understand this part:

    Looking at the ends of nested if statements, it seems as if there may be some more misplaced instructions.

    What do you mean by misplaced instruction? Hooks by Wasabi? Could you point them out in the instrumented program (ideally ~5 lines)? As a first check: is the instrumented program still valid wasm (according to WABT wasm-vaidate)?

  4. If you want to dig deeper or help me with fixing this (I won't have much time in the next 2 weeks unfortunately), you could

    • reduce the program to a smaller one, ideally with only ~5 instructions and describe where exactly you would expect which hook call, and where it is mising.
    • look into the hook insertion code (for end instructions here: https://github.com/danleh/wasabi/blob/master/src/instrument/add_hooks/mod.rs#L247) and see if there is anything surprising (although I am not sure the code is easy enough to understand).

Thanks again for reporting, sorry that I don't have an immediate answer :)

danleh commented 5 years ago

Just to clarify: You mean there is a call 551 missing in this piece of (yet to be instrumented) code, somewhere in the lower half of the code that you posted

if  ;; label = @4
    return

I agree, and I think the bug must be somewhere in the instrumentation for return instructions: https://github.com/danleh/wasabi/blob/master/src/instrument/add_hooks/mod.rs#L389

Maybe this code is wrong, and forgets to return the function block?: https://github.com/danleh/wasabi/blob/master/src/instrument/add_hooks/mod.rs#L408

ghost commented 5 years ago
  1. Thank's for the hint. I actually tried the return hook and it seems that one has the same problem. After a quick look in the code it might be that Wasabi incorrectly assumes that it is processing dead code there - just a guess. For our use case it is actually okay to miss a few function returns but it would be nicer to catch them eventually.

  2. From your explanation I see how Wasm code is meant to be instructed. However, it seems that is not exactly what it currently does.

  3. It seems, I simply misunderstood the end hook. I was wondering about the many "end block" hooks before some of the return statements. With your explanation this makes sense. However, now I think there are a few more hooks missing in the code. I will highlight some below. The instructed code is valid Wasm and runs smoothly. This issue is only about the missing hooks.

  4. I will try to reduce the test case to a few smaller ones.

  5. Yes, this is what I was mainly concerned with.

if  ;; label = @4
    return

Instructed code with comments. I don't think I understand the internals of Wasabi in detail yet so apologies if I draw the wrong conclusions here.

  (func (;798;) (type 29) (param i32 i32)
    (local i32 i32 i32 i32)
    i32.const 787
    i32.const -1
    call 548
    block  ;; label = @1
      i32.const 787
      i32.const 0
      call 549
      block  ;; label = @2
        i32.const 787
        i32.const 1
        call 549
        local.get 0
        i32.load
        local.tee 3
        if  ;; label = @3
          i32.const 787
          i32.const 5
          call 550
          local.get 1
          i32.const 15
          i32.gt_u
          local.set 5
          local.get 0
          i32.const 4
          i32.add
          local.tee 2
          i32.load
          i32.eqz
          if  ;; label = @4
            i32.const 787
            i32.const 16
            call 550
            local.get 5
            if  ;; label = @5
              i32.const 787
              i32.const 18
              call 550
              local.get 0
              local.get 1
              i32.const 1
              i32.add
              i32.const 16
              local.get 0
              i32.load offset=24
              i32.const 0
              i32.const 1783927
              i32.const 305
              call 2740
              local.tee 2
              i32.store
              i32.const 0
              local.set 4
              i32.const 787
              i32.const 44
              i32.const 18
              call 552
              i32.const 787
              i32.const 45
              i32.const 16
              call 552
              i32.const 787
              i32.const 102
              i32.const 5
              call 552
              i32.const 787
              i32.const 103
              i32.const 1
              call 554

>>> At this point the function block, two normal blocks, and three if-blocks are open. I see 4 end-hooks. Are two missing?

              br 3 (;@2;)
            else
              local.get 0
              i32.const 0
              i32.store
              i32.const 0
              local.set 4
              i32.const 0
              local.set 2

>>> At this point the function block, two normal blocks, two if-blocks, and one else-block are open. It seems, that some end-hooks are missing before the break. 

              br 3 (;@2;)
              i32.const 787
              i32.const 44
              i32.const 18
              call 552

>>> Is the hook after the break statement dead code? 

            end
            i32.const 787
            i32.const 45
            i32.const 16
            call 552
          end
          local.get 5
          i32.eqz
          if  ;; label = @4
            i32.const 787
            i32.const 48
            call 550
            local.get 0
            i32.const 0
            i32.store
            i32.const 1
            local.set 4
            i32.const 0
            local.set 2
            i32.const 787
            i32.const 57
            i32.const 48
            call 552
            i32.const 787
            i32.const 102
            i32.const 5
            call 552
            i32.const 787
            i32.const 103
            i32.const 1
            call 554
            br 2 (;@2;)
            i32.const 787
            i32.const 57
            i32.const 48
            call 552
          end
          local.get 0
          local.get 3
          local.get 1
          i32.const 1
          i32.add
          i32.const 16
          local.get 0
          i32.load offset=24
          i32.const 0
          i32.const 1783927
          i32.const 285
          call 2843
          i32.store
          local.get 2
          local.get 1
          i32.store
          i32.const 787
          i32.const 102
          i32.const 5
          call 552
          i32.const 787
          i32.const 103
          i32.const 1
          call 554
          i32.const 787
          i32.const 137
          i32.const 0
          call 554
          i32.const 787
          i32.const 138
          call 551
          return
        else
          local.get 1
          i32.const 16
          i32.lt_u
          if  ;; label = @4
            return
          else
            local.get 0
            local.get 1
            i32.const 1
            i32.add
            i32.const 16
            local.get 0
            i32.load offset=24
            i32.const 0
            i32.const 1783927
            i32.const 276
            call 2740
            local.tee 2
            i32.store
            i32.const 0
            local.set 4
            local.get 0
            i32.const 4
            i32.add
            local.set 3
          end
          i32.const 787
          i32.const 102
          i32.const 5
          call 552
        end
        i32.const 787
        i32.const 103
        i32.const 1
        call 554
      end
      local.get 0
      i32.const 4
      i32.add
      local.set 5
      local.get 2
      if (result i32)  ;; label = @2
        i32.const 787
        i32.const 109
        call 550
        local.get 2
        i32.const 787
        i32.const 111
        i32.const 109
        call 552
      else
        i32.const 787
        i32.const 111
        i32.const 109
        call 555
        local.get 5
        i32.const 787
        i32.const 113
        i32.const 111
        i32.const 109
        call 556
      end
      local.get 3
      local.get 0
      i32.load offset=20
      i32.const 1
      i32.add
      call 34070
      drop
      local.get 4
      if  ;; label = @2
        i32.const 787
        i32.const 122
        call 550
        local.get 3
        local.get 0
        i32.load offset=24
        call 2698
        i32.const 787
        i32.const 127
        i32.const 122
        call 552
      end
      local.get 0
      i32.load
      i32.eqz
      if  ;; label = @2
        i32.const 787
        i32.const 131
        call 550
        i32.const 787
        i32.const 133
        i32.const 131
        call 552
        i32.const 787
        i32.const 137
        i32.const 0
        call 554
        i32.const 787
        i32.const 138
        call 551
        return
        i32.const 787
        i32.const 133
        i32.const 131
        call 552
      end
      local.get 0
      local.get 1
      i32.store offset=4
      i32.const 787
      i32.const 137
      i32.const 0
      call 554
    end
    i32.const 787
    i32.const 138
    call 551)
danleh commented 5 years ago

Wow, that is some excellent investigation! I see what you mean by missing end hooks and I have to agree. I will have to defer this further to next week, but my next steps for debugging would be (more as a reminder for myself, but feel free to contribute if this blocks you):

  1. create a small example of 3-4 nested blocks and br/returns inside them.
  2. add printf/debug the block_stack code: Is the current block correct on the block stack? Is the branch/return target correctly resolved? Are the 'intermediate blocks' between the current and the target block correct? Maybe there is an off-by-one or sth similar in https://github.com/danleh/wasabi/blob/master/src/instrument/add_hooks/block_stack.rs#L132?
ghost commented 5 years ago

It seems the problem is with the detection of unreachable code. I understand the idea with the variable unreachable but it seems incorrect to me in some cases. The problem of this issue arises easily when an else statement follows a return statement. In this case unreachable is set but it incorrectly remains after the else statement. The following test case will reproduce the problem:

(module
  (func (export "main")
    (if 
      (i32.lt_s
        (i32.const 42)
        (i32.const 42)
      )
      (then
        return
      )
      (else
        return
      )
    )
  )
)

The problem seems easy to fix. However, when trying to add special treatment for the else case I suspected more general problems with the stack consistency. If unreachable is set and therefore we skip an end statement we might unintentionally leave elements on the stacks. It seems to me that they can easily become inconsistent that way. If I understand correctly, both stacks (block_stack and type_stack) should always have the same size and they should be empty when completing the Wasm module. Imo assertions wrt the stack size would be very valuable, especially on module completion. However, I don't understand the code well enough to do so and I could not get the cargo tests running. This is the first time I am touching Rust btw, so forgive me if I just don't know how to do it.

The else statement seems particularly tricky wrt unreachability: in some cases only the block-closing hooks are unreachable, in others also the block-opening hooks are.

I was also wondering if adding instructions for unreachable code something to be concerned about. It seems unlikely to have dead code of that sort in Wasm modules. I might be wrong though.

Here are some more test cases to download.

danleh commented 5 years ago

Thanks, that's an excellent example. You are correct, the code around the else is wrongly instrumented. For reference, Wasabi produces:

(module
  (type (;0;) (func (param i32 i32)))
  (type (;1;) (func (param i32 i32 i32)))
  (type (;2;) (func (param i32 i32 i32 i32 i32)))
  (type (;3;) (func))
  (import "__wasabi_hooks" "begin_function" (func (;0;) (type 0)))
  (import "__wasabi_hooks" "i32.const" (func (;1;) (type 1)))
  (import "__wasabi_hooks" "i32.lt_s" (func (;2;) (type 2)))
  (import "__wasabi_hooks" "if" (func (;3;) (type 1)))
  (import "__wasabi_hooks" "begin_if" (func (;4;) (type 0)))
  (import "__wasabi_hooks" "return" (func (;5;) (type 0)))
  (import "__wasabi_hooks" "end_if" (func (;6;) (type 1)))
  (import "__wasabi_hooks" "end_function" (func (;7;) (type 0)))
  (func (;8;) (type 3)
    (local i32 i32 i32 i32)
    i32.const 0
    i32.const -1
    call 0
    i32.const 42
    i32.const 0
    i32.const 0
    i32.const 42
    call 1
    i32.const 42
    i32.const 0
    i32.const 1
    i32.const 42
    call 1
    set_local 1
    tee_local 0
    get_local 1
    i32.lt_s
    tee_local 2
    i32.const 0
    i32.const 2
    get_local 0
    get_local 1
    get_local 2
    call 2
    tee_local 3
    i32.const 0
    i32.const 3
    get_local 3
    call 3
    if  ;; label = @1
      i32.const 0
      i32.const 3
      call 4
      i32.const 0
      i32.const 4
      call 5
      i32.const 0
      i32.const 7
      i32.const 3
      call 6
      i32.const 0
      i32.const 8
      call 7
      return
    else
      return
      i32.const 0
      i32.const 7
      i32.const 3
      call 6
    end
    i32.const 0
    i32.const -1
    call 5
    i32.const 0
    i32.const 8
    call 7)
  (global (;0;) (mut i32) (i32.const 1))
  (export "main" (func 8)))

I will look into the unreachable code detection. I assume you are right: That part of Wasabi was very quick&dirty to begin with, so entirely plausible that it has lingering bugs.

I would rather fix instead of fully remove the unreachable code detection though. The problem came up when I tested Wasabi against the official Wasm test suite. Those examples contain unreachable code and even real-world compilers (emscripten etc.) emit it every once in a while. Without the detection, Wasabi produces type incorrect code, which cannot be validated and thus not executed.

danleh commented 5 years ago

Out of interest though: without unreachable code handling (that is, in your own branch/the PR that you submitted), does it work correctly? If so, would that be a workaround for you, for now? (As long as you don't come across unreachable code in your examples.)

ghost commented 5 years ago

You're right, it does not solve the problem on the actual benchmark. I had tested it only with the small test cases. As we use Wasabi only to find similar functions, we can actually live with the bug for now.

danleh commented 5 years ago

@frgossen-g I think this should be fixed now, thanks again for reporting and the help!

If you can test again, that would be great. Afterwards I would close this.

ghost commented 5 years ago

Thanks for the fix!

danleh commented 5 years ago

Closing, let me know if it's not fixed completely.