Coverage not always reported for functions which raise errors

dorranh commented 3 years ago

Hi @aantron, we've been using bisect_ppx 2.6.1 for an OCaml project and wanted to get your feedback on a behavior we've observed.

In our codebase, we define a helper function that internally calls Stdlib.failwith which we use to throw exceptions. When calling this function within the scope of a function definition, the coverage report generated by bisect_ppx correctly shows the call counts for this function and reports the line as covered.

However, when calling the same function from within a let binding, bisect_ppx reports that the function is never called. We've prepared a minimal example here, most of which I've pasted below. Is there a way to get bisect_ppx to report coverage in this situation? Replacing failwith_error_code with direct calls to failwith in the below example marks these clauses green.

Thanks in advance for your help!

(* example.ml *)

let error_ZeroInput     : int = 1
let error_NegativeInput : int = 2

let failwith_error_code (i: int) : 'a = Stdlib.failwith (string_of_int i)

let half_positive_integer (x: int) =
  let foobar =
    if x < 0 then
      failwith_error_code error_NegativeInput (* always considered uncovered, marked as red *)
    else if x = 0 then
      failwith_error_code error_ZeroInput     (* always considered uncovered, marked as red *)
    else
      x / 2
  in
  foobar

(* tests.ml *)

open OUnit2
open Example

let suite =
  "Example Tests" >::: [
    "X" >:: (fun _ ->
      assert_raises
        (Failure (string_of_int error_ZeroInput))
        (fun () -> half_positive_integer 0)
    );

    "Y" >:: (fun _ ->
      assert_raises
        (Failure (string_of_int error_NegativeInput))
        (fun () -> half_positive_integer (-1))
    );

    "Z" >:: fun _ -> (
      assert_equal
        50
        (half_positive_integer 101)
      )
  ]

let () = run_test_tt_main suite

dorranh commented 3 years ago

Thanks for re-opening @aantron. I used a poor choice of phrasing in our unrelated PR :sweat_smile:

aantron commented 3 years ago

Yep, caught that :P The issue got closed right as I was getting to work on it :D

aantron commented 3 years ago

This behavior of Bisect is as expected. The unvisited points you are seeing are the out-edges of the application expressions, visited by normal return of the functions, and, of course, these functions never return. You can suppress these points (somewhat imprecisely) by annotating the functions inside the applications with [@coverage off]:

(failwith_error_code [@coverage off]) error_ZeroInput

This doesn't happen with failwith because Bisect_ppx automatically suppresses instrumentation of the out-edge of failwith applications, as a convenience, so that people don't have to insert [@coverage off] themselves:

https://github.com/aantron/bisect_ppx/blob/b266182961be6ae16fa38f7d5d1fc0d62e31644d/src/ppx/instrument.ml#L1093

aantron commented 3 years ago

If you use functions like this pervasively, and don't want to insert [@coverage off] at every call site, it may be possible to add some command-line option that can give Bisect additional identifiers to suppress out-edge instrumentation for.

gkaracha commented 3 years ago

@aantron thank you for the explanation, it's been really helpful in understanding the issue!

I spent more time digging into this and I noticed that if-then-else and match-with get colored quite differently:

If-then-else	Match-with

Since failwith never returns, I understand why we don't mark it green when it's on the right-hand side of a branch (whether that is an if-then-else branch or a match-with branch). However, with match-with we can clearly see which branches/clauses are matched in the tests, which is just as good (if we know that the pattern is matched, we also know that the right-hand side has been executed):

Match-with, with pragmas, with failing tests	Match-with, with pragmas, without failing tests

With if-then-else it's unclear whether we ever reached the else-clause, because else is never colored. In fact, in this minimal example, switching the if-then-else to a match-with gives the exact same coverage percentage (77.78%). However, in our own codebase changing from if-then-else to match-with in a function changes coverage from 100% to 96.97% (correctly, these branches are indeed not covered by our tests):

If-then-else (reported 100%)	Match-with (reported 96.97%)

This is a bug, right?

aantron commented 3 years ago

I spent more time digging into this and I noticed that if-then-else and match-with get colored quite differently:

In the first pair of images, I don't see the difference, besides that for match, there is a pattern to place the mark on, so Bisect puts the marks on patterns, while in the if-else, it has to put the mark on the first character of subexpressions.

I'm not sure what the middle two images are comparing (probably due to not being familiar with the tests).

In the last pair of images, in the if case, there should be a mark on line 269 on the Ligo.failwith, but there isn't. I wonder if that's some sort of an interaction between if not having a pattern to put a mark on, and special treatment of failwith (though I would have expected the Ligo. qualifier to matter). That does indeed look like a bug. Is that what you are referring to?

gkaracha commented 3 years ago

In the first pair of images, I don't see the difference, besides that for match, there is a pattern to place the mark on, so Bisect puts the marks on patterns, while in the if-else, it has to put the mark on the first character of subexpressions.

I'm not sure what the middle two images are comparing (probably due to not being familiar with the tests).

Sorry, I should have been more clear; indeed that was the point I was trying to make. As far as I can tell, with match-with, marks are placed on both the patterns and the right-hand sides, which gives pretty accurate results even if we have [@coverage off] on the failwiths. That's more or less what I was trying to point out with the "Match-with, with pragmas, without failing tests" figure.

With if-then-else the markers are placed on the expression (there are no patterns, but I guess we could mark the "else" itself?), which means that we get no marking if the expression in the else is a failwith.

In the last pair of images, in the if case, there should be a mark on line 269 on the Ligo.failwith, but there isn't. I wonder if that's some sort of an interaction between if not having a pattern to put a mark on, and special treatment of failwith (though I would have expected the Ligo. qualifier to matter). That does indeed look like a bug. Is that what you are referring to?

Exactly! My expectation is that the last two figures should have the exact same coverage percentage reported.

PS. Sorry for the images not referring to a minimal example, it's because I don't have one yet, but I will try to craft one.

gkaracha commented 3 years ago

@aantron I think I finally have a minimal example :tada: Apparently the issue does not have to do with whether we use failwith directly or not, but it looks like a weird interaction between if-then-else and the expressions in the branches having a type signature on them (I really didn't see that coming).

So, example.ml contains the following:

let half_positive_integer (x: int) =
  let foobar =
    if x > 0
    then
      x / 2
    else
      (* (failwith "some error")        (* without type sig *) *)
      (* (failwith "some error" : int)  (* with type sig *)    *)
  in
  foobar

and the testsuite (tests.ml) covers only the if-then part:

open OUnit2
open Example

let suite =
  "Tests" >::: [
    "Y" >:: (fun _ ->
      assert_equal
        50
        (half_positive_integer 101)
    );
  ]

let () = run_test_tt_main suite

Depending on whether the expression the else branch has a type signature on it or not, we get different results:

without signature (correct: 66.67% reported)	with signature (wrong: 100.00% reported)

Let me know if that helps!

aantron commented 3 years ago

Let me know if that helps!

Yes! Sorry about the delay. This is very helpful!

I made an even smaller case

let () =
  if true then
    (() : unit)
  else
    ()

Снимок

...and found that this is due to Bisect's interaction with an apparent AST bug in the compiler, which I asked about in https://github.com/ocaml/ocaml/issues/10554. I suggest to see what that discussion yields. Bisect might need a workaround for the bug. Alternatively, if the bug will be fixed and you are satisfied with having this be slightly broken until the next OCaml release, we can just await that.

Let's keep this issue open to track this on the Bisect side, and in case I am wrong about this being a problem in OCaml.

gkaracha commented 3 years ago

Thanks for following up on this!

...and found that this is due to Bisect's interaction with an apparent AST bug in the compiler, which I asked about in ocaml/ocaml#10554. I suggest to see what that discussion yields. Bisect might need a workaround for the bug. Alternatively, if the bug will be fixed and you are satisfied with having this be slightly broken until the next OCaml release, we can just await that.

Let's keep this issue open to track this on the Bisect side, and in case I am wrong about this being a problem in OCaml.

Yeah, I agree. For the project we're working on we'd definitely like to hit 100% test coverage eventually, but in the meantime there's other stuff to work on. Let's wait and see :slightly_smiling_face:

aantron commented 2 years ago

It looks like the upstream OCaml fix has been merged and will be included in the next OCaml releases (5.00 and 4.14, as I understand it). Closing this issue. Thanks for reporting!

aantron / bisect_ppx

Coverage not always reported for functions which raise errors #377