A problem with Testing failures. Proposed solution: Change `fail` keyword to use it with expressions.

perseus-algol commented 1 month ago

Aiken language tests framework designed in a way that it has no assert* methods. Instead test will be considered successful if it returns True. Which is convenient for tests that expected to succeed. For tests that must fail, aiken provides fail keyword.

The problem

And the problem here that fail keyword is designed to mark the whole test with it. It is not possible to check one specific expression for failing.

Why it is a problem?

Lets consider a test that must fail. We mark it with fail keyword. A test body could contain some preparation code and then target expression itself. Our goal is to ensure that the target expression (and only it) fails, the preparation phase expected to successfully executed. The problem with fail keyword is that the whole test will fail if ANY expression fails, for example expect ... expression in the preparation phase, which will mark test succeeded. And this is not what developer expects. In the above example test will be marked as successful while the test itself it damaged and the target expression even didn't called (means test just skipped).

Example of damaged test, which intended to test failing case:

test damaged_test() fail {
  let x = 1 / 0    // will fail here because of Divide By Zero
  x == 2           // this expression will not even be checked
  // but whole test will be marked as PASSED. Which is not what we want.
}

Proposed solution

Change the purpose and scope of the fail keyword:

the fail keyword should check the expression following it and return true only if expression fails.
to avoid skipping damaged tests I suggest to deprecate marking whole tests with fail keyword.

For example, it should be possible to write the following:

fail 1 == 2  // True
fail validator.send(d, r, ctx) // True if validator returns False (or fails)
fail expect Some(x) = None  // True
fail expect datum: MyDatum = inline_datum // True if inline_datum is not of type MyDatum

Why this problem is important?

For onchain validators it is more important to test failing cases than succeeding. Because we need to ensure that every possible non-standard or abnormal usage leads to fail and prohibits spending.

rvcas commented 1 month ago

expect is already asset

perseus-algol commented 1 month ago

If the described problem can be solved by using expect, then would you be so kind to give an example.

In turn, I will give a counterexample that demonstrates that expect cannot check the expression that causes the exception.

First example will work, because validator doesn't cause an exception, but just results in False:

fn validator1(value: Int) {
  value == 0
}

test must_fail_1() {
  let one = 2 - 1
  expect False = validator1(one)
  True
}

Second example has problems (test fails), which shows that expect doesn't work on failing expressions:

fn validator2(denominator: Int) {
  let x = 1 / denominator
  x == 1
}

test must_fail_2() {
  let denominator = 10 - 10
  expect False = validator2(denominator)
  True
}

Results:

    ┍━ demo/expect_as_assert ━━━━━━━━━━━━━━━━━━━
    │ PASS [mem: 2604, cpu: 1071066] must_fail_1
    │ FAIL [mem:  103, cpu:  660037] must_fail_2
    ┕━━━━━━━━━━━━━ 2 tests | 1 passed | 1 failed

I kindly encourage you to carefully read "Why it is a problem?" section from above to see the problem.

KtorZ commented 1 month ago

@perseus-algol I do not understand either what problem are you trying to highlight here? As far as I can tell, the behavior is expected in both cases.

In must_fail_1, the validator returns False, and so the entire test pass. In must_fail_2, you are attempting a division by zero, which causes validator2 to error and never return anything. So the entire must_fail_2 execution fails which causes the test to be marked as FAIL, because it wasn't expected to.

Maybe you are confused by the semantic of expect and of Plutus errors. In particular, this:

test will be considered successful if it returns True

is not quite accurate. Or more specifically, to be considered successful a test must first and foremost return (a valid expression). An error (a.k.a fail) is not an expression, it's a program halt. An exception if you want. As soon as encountered, the entire execution halts. And that's what is happening with your division by zero.

KtorZ commented 1 month ago

If I go back to your original description, it seems that what you would want is to ensure separation of fixture code from test logic. That is possible through the use of zero-arg functions in Aiken which are evaluated during compilation. So you could write:

fn validator2(denominator: Int) {
  let x = 1 / denominator
  x == 1
}

fn my_fixture() -> Bool {
  let denominator = 10 - 10
  validator2(denominator)
}

test must_fail_2() {
  expect False = my_fixture()
  True
}

Which would result in a compiler panic because my_fixture would be evaluated and "throws" during compilation.

   aiken::fatal::error
   crates/aiken-lang/src/gen_uplc.rs:4468:33

       Evaluated a zero argument function and received this error: DivideByZero(
       1,
       0,
   )

It's not as pretty as it could be, but it likely would solve your problem?

perseus-algol commented 1 month ago

@KtorZ, thank you for elaborating on this topic.

In your example my_fixture is not a fixture because it contains validator call. But I understand proposed solution.

So in order to check failing case, we actually need to follow the pattern:

fn validator_fn(denominator: Int) {
  let x = 1 / denominator
  x == 1
}

fn calc(a: Int, b: Int) {
  a / b
}

fn test1_fixtures() {
  calc(1, 0)
}

test test1() fail {
  let f = test1_fixtures()
  validator_fn(f)
}

Therefore, to test a case where we expect to fail, we must extract all preparations into zero-arg function (so it will be evaluated at compile time). Then make a test function marked with fail keyword with two expressions: 1) get fixtures, 2) call validator which we expect to fail. And since fixtures defined in zero-arg function, we can be confident that they evaluated successfully (otherwise we'll get a compilation error). And fail could only happen in validator call.

For me that pattern doesn't look a convenient dev experience, honestly. Something similar to try catch (for tests only, of course) would be better, IMO (see proposal in 1st message).

KtorZ commented 1 month ago

Honestly I am not sure what problem you have exactly with fixture code causing test to fail. If your fixture is not working as expected, then the test as a whole is not working as expected.

So I don't see any problem with the current behavior? Especially in the example you gave. If you're dividing by 0, then surely you test fails, although I agree, maybe not for the reasons you expect originally. But this is why traces are useful; and primitives like "expect" will leave a trace that would let you know whether your assertion failed or not.

Unfortunately, try/catch or something of that nature is not possible on the Plutus Virtual Machine. Errors cannot exist at the value level. There are various CIPs on the topic if you're interested to dig in. So that's not a problem we can solve at the Aiken level other than in the way I pointed above if you really want to separate preparation code from your test logic.

perseus-algol commented 1 month ago

So I don't see any problem with the current behavior?

While we found a way to test failing cases, the only DevX and convenience remain.

In other languages among assertions there is assertThrow which ensures that passed function throws an exception. This is how we could write the test from above example, if we'd have same feature. Let's use fail keyword here as assert throw:

test test1() {
  let arg = calc(0, 1)
  fail validator_fn(arg)
}

But since we don't have such a keyword/feature, we must extract fixtures into zero-arg function like so:

fn test1_fixtures() {
  calc(0, 1)
}

test test1() fail {
  let f = test1_fixtures()
  validator_fn(f)
}

So you could see the difference. First example with proposed fail assertion is more concise and convenient than the second one.

Currently developers must always extract fixture into zero-arg function for every fail test. fail test must consists of exactly: 1) get fixtures from zero-arg function(s). 2) target expression, i.e. validator call.

Due to the fact that described pattern is not documented and is not too obvious, developers are most likely will write it in seemingly natural way:

test test1() fail {
  let arg = calc(1, 0) // Here we got an exception, in fixtures preparation. Which will make test PASS. 
  // Which is a mistake, Because the validator wasn't even called and the error occurred before checking it.
  validator_fn(arg)
}

And that will be a mistake (which could cost a lot in the world of crypto).

A programming language should help you write correct code. In the case of testing failings, currently, it is easy to write incorrect code. And this can be improved by the way I described in the first message:

the fail keyword should check the expression following it and return true only if expression fails.
to avoid skipping damaged tests I suggest to deprecate marking whole tests with fail keyword.

Unfortunately, try/catch or something of that nature is not possible on the Plutus Virtual Machine.

I think that this is fine. My proposal does not affect Plutus Virtual Machine. It could be implemented at aiken compiler level. Suppose that we have following test with two fail expressions:

test test1() {
  let arg = calc(0, 1)
  fail validator_fn_1(arg)
  fail validator_fn_2(arg)
  arg == 0 // fail don't have to be the last expression, we can write more expressions after
}

then it should be compiled into:

fn test1_fixtures() {
  calc(0, 1)
}

test test1_1() fail {
  let arg = test1_fixtures()
  validator_fn_1(arg)
}

test test1_2() fail {
  let arg = test1_fixtures()
  validator_fn_2(arg)
}

test test1_3() {
  let arg = test1_fixtures()
  arg == 0
}

I think this is possible to achieve (without touching Plutus VM). What will make aiken more convenient for developers and less error-prone.

aiken-lang / aiken