codewars / runner

Issue tracker for Code Runner
32 stars 8 forks source link

Prolog: Codewars runner can't handle assertions that produce better failure message #297

Open jcsahnwaldt opened 3 months ago

jcsahnwaldt commented 3 months ago

Describe the bug I tried to use assertions that lead to better failure messages: Expected 'X' Got 'Y' instead of just Assertion: 'X' == 'Y'.

I followed the instructions given in this Stack Overflow answer. The code produces the output I wanted, but the Codewars runner doesn't display it properly, and even seems to mark the test as passed.

To Reproduce Old code: Produces Assertion: 'X' == 'Y' on failure. Not very helpful, not clear which one is the expected / actual value.

test(string_lower_tests, [forall(member(Input-Expect, ["X"-"x", "A"-"a"]))]) :-
    string_lower(Input, Result),
    assertion(Result == Expect).

New code: Produces Expected 'X' Got 'Y' on failure, but the Codewars runner doesn't recognize it.

test(string_lower_tests, [forall(member(Input-Expect, ["X"-"x", "A"-"a"])), Result == Expect]) :-
    string_lower(Input, Result).

Expected behavior The Codewars runner should recognize the better failure messages (Expected 'X' Got 'Y' instead of Assertion: 'X' == 'Y') and show them in the usual way. And the runner should mark the test as failed.

Screenshots New code: Good failure messages, but the Codewars runner doesn't display them properly, and it doesn't quite recognize that the test failed:

Screenshot 2024-03-04 at 17 14 30

Old code: Codewars runner displays failure messages in the correct way and recognizes that the test failed, but the messages are not very helpful, because they don't distinguish between expected and actual value.

Screenshot 2024-03-04 at 17 15 02
hobovsky commented 3 months ago

Check if you can use patterns from my collection of authoring examples:

image

image

(Credits to @B1ts)

jcsahnwaldt commented 3 months ago

Thanks for the suggested workaround. I'll use it in https://www.codewars.com/kata/trilingual-democracy/prolog.

It's a bit cumbersome. For example, I have to use

Tests = [
    (group = "DDD")-(expected = 'D'),
    (group = "DDF")-(expected = 'F'),
    (group = "DDI")-(expected = 'I'),
    ...
],

instead of

Tests = [
    "DDD"-'D',
    "DDF"-'F',
    "DDI"-'I',
    ...
],

But for now that seemed acceptable. I'd rather have bloated code that produces better failure messages than compact code and confusing messages.

Of course, that's just a workaround, not really a fix for the problem.

Also, the failure messages are better, but there's room for improvement. The message Assertion: ? == 'F' (forall bindings = [group="FFF",expected='F']) is better than Assertion: ? == 'F' (forall bindings = ["FFF",'F']), but it would be great if the Codewars runner could process the output generated by PlUnit (for the Result = Expect code I mentioned above) and display something like Expected 'F' Got '?'.

error256 commented 3 months ago
test(string_lower_tests, [forall(member(Input-Expect, ["X"-"x", "A"-"a"])), Result = Expect]) :-
    string_lower(Input, Result).

Result is already unified, so wouldn't string_lower(_, _). just pass this?

jcsahnwaldt commented 3 months ago

@error256 I don't quite understand how it works, but the Result = Expect part is actually short for true(Result = Expect), which is an option of test(Name, Options). See https://www.swi-prolog.org/pldoc/man?section=unitbox

Kacarott commented 3 months ago

He is right though, string_lower(_, _). will pass tests of this form.

jcsahnwaldt commented 3 months ago

Oh, I see what you mean. To catch that pathological case, we'll have to use Result == Expect instead of Result = Expect. I'll update the issue description.

I just meant to say that the Result = Expect part in test(..., [..., Result = Expect]) isn't really a unification. I don't know how it works, but it prints the expected and actual values in case string_lower(Input, Result) doesn't produce the correct result. If Result = Expect were a unification, string_lower(Input, Result) would simply fail in that case (in the usual Prolog sense, not in the sense of a failed test), and PlUnit wouldn't produce a useful message.

jcsahnwaldt commented 3 months ago

I played with the workaround mentioned above a bit more. Two improvements that work well for my case and may also work for others: 1. Lists of tests don't need the cumbersome syntax. 2. Failure messages also show actual value.

sample_tests(Test) :-
    Pairs = [
        "FFF"-'F',
        "IIK"-'K',
        "DFK"-'I'
    ],
    as_tests(Pairs, Test).

random_tests(Test) :-
    Pairs = [
        "DDD"-'D',
        ...
        "KKK"-'K'
    ],
    random_permutation(Pairs, Permutation),
    as_tests(Permutation, Test).

as_tests(Pairs, Test) :-
    member(Group-Expected, Pairs),
    (trilingual_democracy(Group, Result) -> Actual = Result; Actual = undefined),  % predicate under test
    Test = (group = Group)-(expected = Expected)-(actual = Actual).

do_test(expected = Expected, actual = Actual) :-
    assertion(Expected == Actual).

test(sample_tests, forall(sample_tests(_-Expected-Actual))) :-
    do_test(Expected, Actual).

(group = Group)-(expected = Expected)-(actual = Actual) is basically a triple of key-value pairs that's getting packed and unpacked at different points. (That's how I understand it. I know very little Prolog, the description may be laughably wrong.)

In the _-Expected-Actual thing the code ignores the group (using _) to avoid a compiler warning (Singleton variable), but PlUnit will still print the binding group = "..." in the failure message, which is nice.

EDIT: The original version of the code above had this line:

trilingual_democracy(Group, Actual),

This meant that even trilingual_democracy(_, _) :- false. passed the tests, because it always fails and no tests are generated. So I replaced the line by this:

(trilingual_democracy(Group, Result) -> Actual = Result; Actual = undefined),

I hope the tests now catch all pathological cases...

jcsahnwaldt commented 3 months ago

The PlUnit documentation at https://www.swi-prolog.org/pldoc/man?section=testbody sounds like assertion/1 was meant for somewhat special cases, while the more common way to check a result is to add an option to test/2, e.g. test(..., [..., Result == Expect]).