Should we disable the test runner for some practice exercises?

angelikatyborska commented 3 years ago

From Slack:

Jeremy Walker 1:20 PM We mentioned on the call that you can disable the test-runner for specific exercises that you know it won't work for (e.g. exercises that take ages to run). The flag is here: https://github.com/exercism/docs/pull/169

If there is an exercise where many solutions will time out, but the optimal ones will pass in time (e.g. pythag triplets), it is worth mentioning this in the instructions:

Note: your solution must complete in under 10s for the tests to pass on the online editor and exercism website.

Remember, student's won't be able to submit in the editor until the tests pass, so you're effectively blocking students from submitting suboptimal solutions if the test_runner is enabled for these exercises.

Here's the test runtime for example solutions on my machine (3 year old MBP):

exercise	example solution test duration
accumulate	0.06 seconds
acronym	0.06 seconds
affine-cipher	0.07 seconds
all-your-base	0.06 seconds
allergies	0.1 seconds
alphametics	10.5 seconds
anagram	0.07 seconds
armstrong-numbers	0.06 seconds
atbash-cipher	0.07 seconds
bank-account	0.07 seconds
beer-song	0.04 seconds
binary	0.05 seconds
binary-search	0.06 seconds
binary-search-tree	0.1 seconds
bob	0.08 seconds
book-store	0.06 seconds
bowling	0.1 seconds
change	0.07 seconds
circular-buffer	0.09 seconds
clock	0.1 seconds
collatz-conjecture	0.04 seconds
complex-numbers	0.08 seconds
connect	0.04 seconds
crypto-square	0.08 seconds
custom-set	0.1 seconds
darts	0.04 seconds
diamond	0.03 seconds
difference-of-squares	0.03 seconds
diffie-hellman	0.08 seconds
dnd-character	0.06 seconds
dominoes	0.07 seconds
dot-dsl	0.1 seconds
etl	0.06 seconds
flatten-array	0.03 seconds
food-chain	0.04 seconds
forth	0.1 seconds
gigasecond	0.03 seconds
go-counting	0.05 seconds
grade-school	0.03 seconds
grains	0.04 seconds
grep	0.09 seconds
hamming	0.05 seconds
hello-world	0.03 seconds
hexadecimal	0.1 seconds
isbn-verifier	0.1 seconds
isogram	0.09 seconds
kindergarten-garden	0.1 seconds
largest-series-product	0.06 seconds
leap	0.05 seconds
list-ops	2.6 seconds
luhn	0.1 seconds
markdown	0.05 seconds
matching-brackets	0.08 seconds
matrix	0.05 seconds
meetup	0.2 seconds
minesweeper	0.06 seconds
nth-prime	12.2 seconds
nucleotide-count	0.04 seconds
ocr-numbers	0.06 seconds
palindrome-products	18.7 seconds
pangram	0.08 seconds
parallel-letter-frequency	0.09 seconds
pascals-triangle	0.04 seconds
perfect-numbers	0.7 seconds
phone-number	0.05 seconds
pig-latin	0.06 seconds
poker	0.1 seconds
pov	0.09 seconds
prime-factors	0.04 seconds
protein-translation	0.06 seconds
pythagorean-triplet	0.06 seconds
queen-attack	0.08 seconds
rail-fence-cipher	0.04 seconds
raindrops	0.05 seconds
rational-numbers	0.1 seconds
resistor-color	0.03 seconds
resistor-color-duo	0.03 seconds
resistor-color-trio	0.03 seconds
rna-transcription	0.03 seconds
robot-simulator	0.09 seconds
roman-numerals	0.05 seconds
rotational-cipher	0.04 seconds
run-length-encoding	0.04 seconds
saddle-points	0.06 seconds
satellite	0.06 seconds
say	0.05 seconds
scale-generator	0.1 seconds
scrabble-score	0.05 seconds
secret-handshake	0.04 seconds
series	0.05 seconds
sgf-parsing	0.06 seconds
sieve	0.03 seconds
simple-cipher	0.1 seconds
simple-linked-list	0.1 seconds
space-age	0.04 seconds
spiral-matrix	0.03 seconds
square-root	0.03 seconds
strain	0.05 seconds
sublist	0.4 seconds
sum-of-multiples	0.05 seconds
tournament	0.05 seconds
transpose	0.06 seconds
triangle	0.05 seconds
twelve-days	0.05 seconds
two-bucket	0.04 seconds
two-fer	0.04 seconds
variable-length-quantity	0.07 seconds
word-count	0.06 seconds
word-search	0.09 seconds
wordy	0.07 seconds
yacht	0.07 seconds
zebra-puzzle	0.03 seconds
zipper	0.06 seconds

angelikatyborska commented 3 years ago

Likely problematic exercises:	exercise	example solution test duration
alphametics	10.5 seconds
list-ops	2.6 seconds
nth-prime	12.2 seconds
palindrome-products	18.7 seconds
prime-factors	0.04 seconds
pythagorean-triplet	0.06 seconds

A naive solution of pythagorean-triplet takes way over 10s to pass the tests. There are hints on how to make it faster in hints.md.

In prime-factors there's a test case that's also meant to test if your solution is fast enough.

neenjaw commented 3 years ago

Can we just skip those specific test cases causing these times? maybe can add a "long-running" or "bonus" tag which is skipped by the test runner.

angelikatyborska commented 3 years ago

We could introduce a new tag that we only exclude in the test runner but not when the student runs the tests locally.

angelikatyborska commented 3 years ago

@iHiD Would it be ok with you if Elixir turns off the slow test cases instead of turning off the whole test runner?

This would mean that students in the web editor can still solve the exercise sub-optimally, but they must download it locally to work on an optimal solution and run the slow tests.

I could imagine this could require displaying some warning somewhere? In instructions.append.md maybe? I think the current test runner's interface wouldn't allow anything like that.

iHiD commented 3 years ago

That's a great idea :)

angelikatyborska commented 3 years ago

The mix test option --include overrides the option --exclude, so in order to exclude a tag like :slow in the test runner, we will need to stop relying on --include pending:true as a way to include pending tests. Pending tests aren't excluded by default, they're only excluded because of the test_helper.exs files that each exercise has.

jiegillet commented 3 years ago

I was playing around and I found a way. It can probably be cleaner, but it exists.

Here is how it works: I modified bin/run.sh to create a backup of test_helper.exs (like it does for the other files) and replace it by one that doesn't have exclude: :pending. Then, when running the tests, I replaced --include pending:true by --exclude slow and voila. Original files are restored after.

Would that work for us? Should I do a PR?

Are all test_helper.exs the same for all exercises? If somehow some are special that method may not work. EDIT: all practices exercises have the same one, but concept exercises don't. stack-underflow has options, and need-for-speed is the only one without seed: 0.

angelikatyborska commented 3 years ago

stack-underflow has options

that's on purpose

need-for-speed is the only one without seed: 0

That's an accident...

Hmm. I wonder what happens if you call ExUnit.configure two times in a row? If it overwrites the settings from the first call, then we could just append the desired settings instead of having to replace the whole file.

jiegillet commented 3 years ago

That's an accident...

I thought so. No other practice exercise has seed: 0 either, is that meaningful?

Hmm. I wonder what happens if you call ExUnit.configure two times in a row? If it overwrites the settings from the first call, then we could just append the desired settings instead of having to replace the whole file.

Let me try :)

jiegillet commented 3 years ago

It seems to stack up in a non-trivial way instead of overwriting. I tried

ExUnit.configure(exclude: :pending, trace: true)
ExUnit.configure(include: :pending)
ExUnit.configure(exclude: :slow)

:pending end up being included but :slow are not excluded (the test with :slow also has :pending).

jiegillet commented 3 years ago

Alternate method, a bit more subtle: sed -i 's/exclude: :pending, //' That might do the trick :)

angelikatyborska commented 3 years ago

That won't help us...

No other practice exercise has seed: 0 either, is that meaningful?

Yes. seed: 0 means tests will be run in the order they're defined in the file. This is usually undesirable because your tests shouldn't need a specific order to pass. So it's not used in practice exercises. But in concept exercises, it's helpful for learning to always have the failures in a predictable order.

I think we can make those two assumptions:

Only practice exercises will have slow test cases
Only concept exercises can have a custom test_helper.exs

Alternate method, a bit more subtle: sed -i 's/exclude: :pending, //'

Beware of:

user submitted custom test_helper.exs so it might have different stuff in there
removing commas not-in-a-smart way which might cause syntax errors

An Elixir script that works on the AST would be safer, if it's possible?

jiegillet commented 3 years ago

I know what seed: 0 does, I use it almost systematically when debugging, I was just wondering why concept exercises use it, and yes, it makes sense for beginners.

An Elixir script that works on the AST would be safer, if it's possible?

Sure, that's possible. Then would it be a new module TestHelper.Transformer or should I just rip off all the @tag :pending in TestSource.Transformer?

angelikatyborska commented 3 years ago

Adding new functions to TestSource.Transformer, e.g. transform_test_helper makes sense to me.

jiegillet commented 3 years ago

Well, I added literally three lines and it seems to work. Nice :) It's passed my bedtime, so I'll clean it up and do a PR later.

neenjaw commented 3 years ago

If we just give them a second tag, can we not just use the command line option --exclude using the test transformer seems the wrong way to do this.

Example:

# testing_test.exs
defmodule TestingTest do
  use ExUnit.Case
  doctest Testing

  @tag :pending
  @tag :slow
  test "greets the world" do
    assert Testing.hello() == :world
  end
end

# test_helper.exs
ExUnit.configure(exclude: :pending, trace: true)
ExUnit.start()

terminal output:

➜  testing mix test --include pending:true --exclude slow:true
Excluding tags: [:pending, {:slow, "true"}]
Including tags: [pending: "true"]

TestingTest [test/testing_test.exs]
  * test greets the world (excluded) [L#7]
  * doctest Testing.hello/0 (1) (0.00ms) [L#3]

Finished in 0.02 seconds (0.00s async, 0.02s sync)
1 doctest, 1 test, 0 failures, 1 excluded

Randomized with seed 446155

we can just append the excluded option at the end of the shell script

angelikatyborska commented 3 years ago

@neenjaw What you're suggesting doesn't work. --include pending overwrites --exclude slow (doesn't matter if :true or not).

Slightly modified zebra-puzzle test as proof, I added @slow to one of the cases:

angelika in ~/Documents/exercism/elixir/exercises/practice/zebra-puzzle (concept-exercises-review | ⚑1)
$ cat test/test_helper.exs 
ExUnit.start()
ExUnit.configure(exclude: :pending, trace: true)

angelika in ~/Documents/exercism/elixir/exercises/practice/zebra-puzzle (concept-exercises-review | ∙1⚑1)
$ cat test/zebra_puzzle_test.exs 
defmodule ZebraPuzzleTest do
  use ExUnit.Case

  @tag :pending
  test "resident who drinks water" do
    assert ZebraPuzzle.drinks_water() == :norwegian
  end

  @tag :pending
  @tag :slow
  test "resident who owns zebra" do
    assert ZebraPuzzle.owns_zebra() == :japanese
  end
end

angelika in ~/Documents/exercism/elixir/exercises/practice/zebra-puzzle (concept-exercises-review | ∙1⚑1)
$ mix test --include pending --exclude slow
Compiling 1 file (.ex)
Generated zebra_puzzle app
Excluding tags: [:pending, {:slow, "true"}]
Including tags: [pending: "true"]

ZebraPuzzleTest [test/zebra_puzzle_test.exs]
  * test resident who owns zebra (2.9ms) [L#11]

  1) test resident who owns zebra (ZebraPuzzleTest)
     test/zebra_puzzle_test.exs:11
     Assertion with == failed
     code:  assert ZebraPuzzle.owns_zebra() == :japanese
     left:  nil
     right: :japanese
     stacktrace:
       test/zebra_puzzle_test.exs:12: (test)

  * test resident who drinks water (0.01ms) [L#5]

  2) test resident who drinks water (ZebraPuzzleTest)
     test/zebra_puzzle_test.exs:5
     Assertion with == failed
     code:  assert ZebraPuzzle.drinks_water() == :norwegian
     left:  nil
     right: :norwegian
     stacktrace:
       test/zebra_puzzle_test.exs:6: (test)

Finished in 0.02 seconds (0.00s async, 0.02s sync)
2 tests, 2 failures

Randomized with seed 153179

neenjaw commented 3 years ago

~Ah, well that sucks that limitation isn't represented in the mix docs~

neenjaw commented 3 years ago

Reading the exunit source code, looks like test cases can only have one tag, but you can qualify them with a value. This works for me:

defmodule TestingTest do
  use ExUnit.Case
  doctest Testing

  @tag :pending
  test "greets the world" do
    assert Testing.hello() == :world
  end

  @tag pending: :slow
  test "greets the world again" do
    assert Testing.hello() == :world
  end
end

➜  testing mix test --include pending:true
Excluding tags: [:pending]
Including tags: [pending: "true"]

TestingTest [test/testing_test.exs]
  * test greets the world again (excluded) [L#11]
  * test greets the world (0.00ms) [L#6]
  * doctest Testing.hello/0 (1) (0.00ms) [L#3]

Finished in 0.02 seconds (0.00s async, 0.02s sync)
1 doctest, 2 tests, 0 failures, 1 excluded

Randomized with seed 147784

neenjaw commented 3 years ago

using just --include pending will run all of the tests, meaning we don't need to update our ci

angelikatyborska commented 3 years ago

Reading the exunit source code, looks like test cases can only have one tag,

Are you sure of that? We already use both :pending and :task_id and I didn't notice any problems.

angelikatyborska commented 3 years ago

@tag is registered as a cumulative attribute: https://github.com/elixir-lang/elixir/blob/master/lib/ex_unit/lib/ex_unit/case.ex#L295 I'm pretty sure we're safe using multiple ones for different concerns.

neenjaw commented 3 years ago

🤷🏻 All I know is that it didn't work as I expected from reading the source using different tags

jiegillet commented 3 years ago

@tag pending: :slow does work, it's unfortunate that it mixes two meanings pending and slow that should really be independent concepts. I also agree that changing the AST is a bit unfortunate, the solution I have replaces @tag :pending with @tag :none (or whatever) in the test files. I won't submit a PR then.

neenjaw commented 3 years ago

Why don't we just swap out the test_helper.exs file that has the ExUnit.config(...) that we need to just skip the slow tests. That way, can use whatever meaningful tag we want, and not bother with the fuss of manipulating AST.

jiegillet commented 3 years ago

We talked about that, but not all test_helper.exs are the same (details somewhere above). That's when Angelika proposed going into the test_helper.exs's AST to remove exclude: :pending. Which we could still do, but I realized removing the tags in the tests themselves can be done in 3 lines.

exercism / elixir

Should we disable the test runner for some practice exercises? #817