Open fingolfin opened 4 years ago
I am interested in what people would like the output to be, and how unhappy they are with the current output. There are various things, which are various levels of difficult to implement, but it would be nice to start looking at what is actually wanted.
In the past my main concern was having lines which were impossible to turn green, because due to how GAP worked some lines ended up marked as "read" but never "executed". I believe these are all now fixed, but there is still plenty of weirdness.
@ChrisJefferson I agree that coverage data is reasonably good now (a big "thank you" to you for that!). But as outlined it could be better, and indeed the first step should be to "start looking at what is actually wanted", as you put it. Which is why I outlined several suggestions for that above. Would be great to hear your thoughts on that. Esp. the two short code snippets where I state which of them I'd like to see marked as code and which not. We can then argue about that ;-)
Short version: Coverage of interpreted and coded/executed code is not consistent with each other, and we should probably come up with some good rules for what we want to consider as "code" and what not before making any attempts to rectify this.
Long version: I noticed that PR #3839 reduced coverage, and started to investigate. According to Codecov, it removes about 6000 lines of code (in the sense that lines which were recognized as code before are no longer), most of which were considered as "executed", hence overall coverage actually drops.
I don't claim to have studied this in detail yet, but a major source of this seems to be that the
end
in an interpretedfunction() ... end
expression now is generally not considered as code, while before, it usually (but not always!) was registered as such. One can see this by e.g. comparing the following two side by side:If you look at line 283 (almost at the end of the file), you see an
end
that is not marked as code in both. And I think this gives a hint at what's going on, namely I believe there is a general deficiency in the way we track interpreted code lines right now: based on some experiments I made, my impression is that often a line is marked as code only "by accident", during the execution of the next "line of code" / next statement. I believe that is/was the case for theend
here as well, which is why it wasn't covered whenend
was the last code in the file. But at this point, this is a conjecture, as I did not yet actually debug this deeperLet me point out that it is not even clear to me whether this change is a bug or a feature; i.e., is the changed behavior a regression, or an improvement (or perhaps neither)? After all, the
end
doesn't really do anything, and so arguably is not "executed".That said, it certainly is inconsistent with how coverage works in coded (not interpreted) code, as can be seen here, as
end
is marked as code there, both before and after:Yet sometimes,
end
is marked as code even with this PR, see e.g. init.g:167, so that is a stroke against the behaviour in this PR whatever it is, it should at least be consistent). Of course, as my example above shows, things are not consistent wrt toend
already inmaster
.And there are also things were coverage arguably improved with this PR, cf. these (and in addition, also compare the last line of the two files)
In the same file, we see a method whose body appears to be "covered" in
master
, but I think (but have not verified) that this actually is a fluke, caused by the statement being closed on the same line; this is visible as non-executed code with this PR (and I think this would be an argument for not covering lines as "code" if they only contain isolated keywords likeif
,end
,)
;There are more inconsistencies when looking at
else
,fi
,function
and probably more, e.g.:function(...)
is not marked as code, both with and without this PR, both in interpreted and coded sectionselse
is nor marked as code when coded, but is marked as code when interpreted; whilefi
is always marked as code: compare init.g:190 vs init.g:112All in all, I think with the current approach for tracking interpreted code, we'll never quite resolve this; we need to use a more sophisticated approach for that, just hooking into
Match
is not enough. For example, if I write:Then which lines should be marked as code? I would argue all except for that with the comment, and the one with the semicolon: because the other each trigger some kind of computation, being the lookup of an identifier
x
(is it lvar/hvar/gvar/dvar?), executing an assignment, immediate constant 1, performing addition.Of course this is highly artificial code, nobody would write it like that (I hope). My goal here is not to get coverage for such artificial code right, but rather to determine if we can come up with some rules what should be considered code and what not; only then can we try to implement those rules (the needs and issues of that might then force us to change those rules, but my point here is that until we agree what we want to be counted as code and what not, it's impossible to fully resolve this situation).
Here is another example:
I marked all lines I'd consider as "code". IMHO
then
,;
andfi
are just "delimiters". This is supported by languages like Python or Lisp needing none of them. Although arguably Python substitutesthen
by:
plus a newline; and Lisp uses parenthesis to group things; but my point stands that these devices simply exist to make clear what belongs where, but don't directly translate to instructions. Of course that's not a really rigorous argument, I am simply trying to develop some heuristics.... One reason for not wanting to treatfi
,end
,od
,)
as code is that it avoids issues like in grpfp.gi:90 which is marked as covered but shouldn't be, and I suspect the) );
on the same line is the culprit, though I am not sure.Anyway, this line of thinking also justifies why
local ...;
is not marked as code. I am somewhat torn onfunction(...)
, but I am now leaning towards saying it should be marked as code, just likeif
andrepeat
andwhile
.Looking at coverage for our C code, we see that
{
and}
are not marked as code there either, nor are simply variable declarations likeint i
. Note that the start of a function likestatic void InstallZeroMutObject ( Int verb )
(from ariths.c:137) is marked as code, which is consistent with my idea of treatingfunction(...)
as code, too.Conveniently, these rules mean that PR #3839 is "correct" in the way it changes coverage :-) but that is not the motivation for my reasoning. I am absolutely willing to rework PR #3839 as necessary.