ales-t / rjp

Rapid JSON-lines processor
Apache License 2.0
3 stars 0 forks source link

Testing scaffolding #4

Closed zouharvi closed 2 years ago

zouharvi commented 2 years ago

I managed to decouple the CLI from the main worker which I now call main_worker. There are some issues associated with it that are described in test.rs, such as the trait bounds etc.

Copying the description of the test_in_dirs function:

Runs all tests in the tests directory. This assumes that the directory tests contains multiple directories, each with at least two files: command and output. The command is executed and compared with the output (assert). The test is run from the top-level project directory but arguments in the command ending with .json and .tsv are automatically prefixed with the path to the subdirectory. The first parameter of the command is the file that's redirected to the worker (imagine prefixing the whole command with a <). To add more tests, simply copy one of the existing folders.

Pros of this approach:

Cons of this approach:

There are three options:

  1. Accept this code as is
  2. Find a way to resolve the cons but using the current structure
  3. Turn the written test code into an easily assertible function that takes in a string of commands (written in code, each in individual functions) and a path to expected output.

I personally lean towards 3 because it easily fixes all the current cons. What I don't like about it is that the command to run rjp is written in code. From the perspective of tests that seems rather like data and should therefore be in a separate file. But maybe there's an intermediate solution that combines the current directory structure with the more modular functions.

I currently added two test from your comment in #3 and did not think much about their validity. There are more issues with this code, such as panicking whenever rjp panics but I think that for the initial shot on testing it's ok. Apologies it took me so long to get to this.

Let me know whether option 3 is a good way to go and whether you have some more remarks. It should be fairly easy to implement it.

zouharvi commented 2 years ago

I rewrote the tests according to 2&3. Now the cargo test output looks as follows (second test was intentionally set to fail):

running 2 tests
test test::tests::simple_join ... ok
test test::tests::simple_index ... FAILED

Adding a new test now means adding a function such as this one:

#[test]
fn simple_join() {
    test_valid_rjp_output("a.json", "join b.json a,b", "a_join_b_output.json");
}

PS I also removed .json and .tsv from gitignore so that I could add the testfiles.

Update: This also allows for some rudimentary testing (once we add larger tests) with cargo bench -- --test.

zouharvi commented 2 years ago

Rebased.

ales-t commented 2 years ago

One downside of this solution is clutter in tests.

I think that at some point in the future, we will need to revisit the test structure. One option would be to structure resource files into directories named after individual tests (kind of like the initial approach).

But I don't see an elegant way to do this without repetition, while maintaining that each test is a separate function, so it's easy to see what failed.

zouharvi commented 2 years ago

Updated the code based on your comments.

I think that at some point in the future, we will need to revisit the test structure. One option would be to structure resource files into directories named after individual tests (kind of like the initial approach).

With the current test code that can still be done. The parameters passed into the test_valid_rjp_output just need to include the new directory.