byte-physics / igortest

Igor Pro Universal Testing Framework
https://docs.byte-physics.de/igor-unit-testing-framework/
BSD 3-Clause "New" or "Revised" License
7 stars 2 forks source link

Analyse branch coverage of complex if, do-while and for-statements #450

Open Garados007 opened 1 year ago

Garados007 commented 1 year ago

An if-statement can combine multiple checks with operators like && or || which can be understood as a complex setup of multiple if-statements. At the moment we do ignore all these possible branches and only perform a check if the combined statement is true or false. There is also no check for do-while-statements and for-statements.

Here is a small proposal that allows to check the coverage for all possible branches in if-statements, do-while-statements and for-statements.

Marking branching statements

All if-statements, do-while-statements and for-statements do get a unique statement-ID. This is collected in a global wave that contains for each statement-ID the file name and line number this statement is located. This global wave does also contain the number of possible branches for each statement.

Detection of all branches in if-statements and do-while-statements

Both statements do operate similar so I discuss both cases first. First we have to split the whole command line into it's essential building blocks. A if-statement starts with a (?i)^\s*if\s*\( and ends with a \)\s*$. Similarry, the do-while-statements starts with a (?i)^\s*while\s*\( and ends with a \)\s*$. Inside is everything that is a combination of some building blocks.

After removing the line start and end we have to check the content of each pair of paranthesis (...) recursively if there is a &&, ||, &, |, ? or %^. If there is one we have found multiple branching conditions that are combined into one if or do-while-statement. Mark each part left and right of &&, ||, &, |, ? or %^ as a branching block and check their brackets (if there is one) recursively again.

After all these recursive checks the argument of the if-statement (or do-while-statement) is also marked as a branching block if there wasn't any part of the argument marked as branching block.

Let's say we have the following if-statement:

if(a < 5 && b > 10 && c == 0)
    print "success", a, b, c
else
    print "fail", a, b, c
endif

This contains the at the top level the operator && which splits the argument in three branching blocks: a < 5, b > 10 and c == 0. All branching blocks doesn't contain any parenthesis (...) so this check isn't repeated recursively. Also, the whole if-argument a < 5 && b > 10 && c == 0 isn't marked as branching block because it already contain three branching blocks.

If a branching block contains at least two branching block recursively, so the parent branching block is not marked as a branching block.

Let's say we have the following do-while-statement:

while(a && (b || c))

This contains the following branching blocks: a, b and c.

All if-statements and do-while-statements contain at least one branching block. Each of these statements contain 2^n branches if n is the number of branching blocks.

Marking of all branches in if-statements and do-while-statements

Each branching block is inserted into a function call to ZB_. This function does get a variable igortest_bs_<id> (this is discussed later), the id of the branch and the result of the branching block as arguments and returns the resulting value of the branching block.

Each branching block has it's own id that is unique for this statement and is represented by a specific bit of a number. The first branching block sets the first bit (0x01), the second branching block the second bit (0x02) and so on.

The above if-statement with the inserted ZB_ functions. The if-statement received the ID 1337.

if(ZB_(igortest_bs_1337, 0x01, a < 5) && ZB_(igortest_bs_1337, 0x02, b > 10) && ZB_(igortest_bs_1337, 0x04, c == 0))

Detection and Marking of all branches in for-statements

The for-statements are a bit more complex than if-statements and do-while-statements. Each for-statement start with a (?i)^\s*for\s*\((?:[^;"]|"(?:[^\\"]|\\.)*")*; and ends with a ;(?:[^;"]|"(?:[^\\"]|\\.)*")\)*\s*$. Aside from that a for-statement works identically as an if-statement or do-while-statement.

Final setup

At the start of each instrumented functions is a list of igortest_bs_<id> variables introduced. Each variable is for each if-statement, do-while-statement and for-statement the function contains. The <id> part is replaced with the statement ID.

After applying the transformation above is the argument to the if-statement, do-while-statement and for-statement provided as an argument to the ZI_ function. This function also receives the return value of ZIS_ which itself received the statement ID and the variable igortest_bs_<id> as reference.

The above if-statement with the inserted ZI_ and ZIS_ functions. ZB_ is already inserted.

if(ZI_(ZIS_(1337, igortest_bs_1337), ZB_(igortest_bs_1337, 0x01, a < 5) && ZB_(igortest_bs_1337, 0x02, b > 10) && ZB_(igortest_bs_1337, 0x04, c == 0)))

The ZIS_, ZI_ and ZB_ functions

The ZIS_ function uses the statement ID to create a new entry on a stack for the current thread. Each thread stores it's own stack. The entry contains the statement ID and a 0 for the current branch sum. This function stores the stack index in the igortest_bs_<id> variable and returns it to the caller.

Each call to ZB_ checks if the argument can be evaluated to true or false. If the argument can be evaluated to true it will add the branching block ID (which was represented as a unique bit) to the current sum to the references stack index.

The last call to ZI_ finalize the currently recorded value at the provided stack index. It stores in a global wave that the branch with the given branching sum was triggered at least once (similarry, how the line numbers are counted). After that is the entry from the stack removed (and optionally move non-finalized entries upwards).

If a user function triggers an abort during the evaluation of the branching statement the evaluation of the argument of the ZI_ function is stopped and the current stack entry is never finalized. The non-finalized entries are kept until report generation and reported as warnings (or errors if configured). It is not recommended to abort during evaluation of arguments in if-statements (or do-while or for).

Issues

  1. Each branching statement can have up to 53 branching blocks. This is the number of bits in the mantissa of a variable. We can change it to 63 or 64 if we use uint64 or int64.
  2. These transformation do extend the length of the line a lot. It is possible that we reach the maximum line length of 1000 pretty easy. As far as I know there is no easy way to reduce the line length which works for if-statements, do-while-statements and for-statements and doesn't break the stack trace.
  3. This method ignores if some branch conditions are hidden away in another function and only respects the resulting value of functions.

Conclusion

In the end we have detailed data how many times each branch for all if-statements, do-while-statements and for-statements are executed and can be reported to Cobertura. This approach is also safe if the user aborts the execution during evaluation.

Garados007 commented 1 year ago

It's an idea I've had for a while, and I just wrote it down now so that it doesn't get lost.

t-b commented 3 weeks ago

This would be actually interesting to get supported. We also need to check if cobertura, https://docs.byte-physics.de/igortest/advanced.html#cobertura-reference, supports this level of fine-grained-ness.