Tests for HSLuv/HPLuv clogging up the test runner

LeaVerou commented 5 months ago

It’s great that we have so many tests for these new color spaces (16K tests!) but that means that now 99% of our testsuite is testing these two color spaces, and npm run test now takes several minutes (!).

I propose we pick a few characteristic tests to run with a regular testsuite run, and the rest can be run explicitly (via e.g. htest test/hpluv/all.js), when we really need to thoroughly test correctness for these color spaces.

(also if there's anything on the hTest side that could improve that experience, I'm all ears!)

facelessuser commented 5 months ago

I think it is a matter of efficiency and testing what matters. In my library, I can test all relevant tests pretty fast, and Python is not known for its blazing fast speed 🙃 .

➜  coloraide git:(main) ✗ python3 -m pytest tests/test_hsluv/test_snapshot.py
========================================================================================== test session starts ==========================================================================================
platform darwin -- Python 3.11.3, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/facelessuser/Code/github/coloraide
plugins: xonsh-0.13.4
collected 1 item

tests/test_hsluv/test_snapshot.py .                                                                                                                                                               [100%]

=========================================================================================== 1 passed in 1.28s ===========================================================================================

My comment from the test:

        The snapshot contains a number of unnecessary tests, at least for our purposes. It is sufficient to verify that
        our path from sRGB to HSLuv and HPLuv is correct, both forward and backwards, as our conversion path is noted
        below:

        srgb -> srgb-linear -> xyz-d65 -> luv -> lchuv -> hsluv
        srgb -> srgb-linear -> xyz-d65 -> luv -> lchuv -> hpluv
        ```

    Verifying each step in the chain makes no sense. If we are able to get through this entire chain and the sRGB
    to HSLuv checks out, both forward and backwards through the path, then we can consider our implementation
    successful. Any failure in the chain would cause errors to propagate up that would cause the test to fail. It
    would only be useful to test each point in the chain if we were trying to identify where in the chain breakage
    occurred.



You can see what I actually test here: https://github.com/facelessuser/coloraide/blob/main/tests/test_hsluv/test_snapshot.py

I would advise testing only the forward and backward conversion from HSLuv <-> sRGB and HPLuv <-> sRGB. Don't test XYZ as it is tested in the chain already. I wanted to formally pass the HSLuv and HPLuv tests to claim it was compliant, so I did not test all their data points for Luv and LCh, because again, if I can pass the HSLuv and HPLuv tests, I'm already passing LChuv and Luv. You can do a small set of Luv/LCh tests for completeness if you like.

One thing I'm uncertain of is whether part of the problem is that the test suite generates individual tests or each case generating excessive overhead? If so, I would consider maybe having a snapshot test, one test that validates the snapshot, so no additional overhead per case in the JSON. I imagine, even if I were to test every data point in the JSON in my own test suite, I could only see it increasing the snapshot test time by a little over double. So, I suspect some excessive overhead is generated by the test suite per test case in the JSON. But maybe I'm wrong.

lloydk commented 5 months ago

It’s great that we have so many tests for these new color spaces (16K tests!) but that means that now 99% of our testsuite is testing these two color spaces, and npm run test now takes several minutes (!).

That was the reason I initially had the tests in the hsluv directory but somebody suggested moving them to the top level test directory. :smiley:

I propose we pick a few characteristic tests to run with a regular testsuite run, and the rest can be run explicitly (via e.g. htest test/hpluv/all.js), when we really need to thoroughly test correctness for these color spaces.

I'm fine with that.

(also if there's anything on the hTest side that could improve that experience, I'm all ears!)

The majority of the time is spent updating the console with the countdown of the number of tests remaining.

More specifically converting result to a string in this code causes the majority of the slowdown:

https://github.com/LeaVerou/htest/blob/bc877e6a806f759d29976e9524dbf25f70f12874/src/env/node.js#L65-L79

And I think this line in the toString() method could be problematic because ret.children might have 4096 tests (x4 if you're running the whole test suite)

https://github.com/LeaVerou/htest/blob/bc877e6a806f759d29976e9524dbf25f70f12874/src/classes/TestResult.js#L367

lloydk commented 5 months ago

I would advise testing only the forward and backward conversion from HSLuv <-> sRGB and HPLuv <-> sRGB. Don't test XYZ as it is tested in the chain already. I wanted to formally pass the HSLuv and HPLuv tests to claim it was compliant, so I did not test all their data points for Luv and LCh, because again, if I can pass the HSLuv and HPLuv tests, I'm already passing LChuv and Luv. You can do a small set of Luv/LCh tests for completeness if you like.

The tests only check sRGB to/from HSLuv and HPLuv.

lloydk commented 5 months ago

Note that the test times reported by htest are misleading and don't account for any overhead added by htest.

facelessuser commented 5 months ago

Yeah, that overhead was what I was afraid was happening.

lloydk commented 5 months ago

I hacked in some throttling to the node runner and the results look pretty good:

I did make some modifications to the hsluv tests to not parse color strings or create Color objects which speed up the test times reported by htest by around 3x.

Changes I made to src/env/node.js:

    done: (result, options) => {
        let tree;
        // Get the tree every 16ms or when there are no pending tests
        if (result.lastUpdate === undefined || performance.now() - result.lastUpdate > 16 || result.stats.pending === 0) {
            result.lastUpdate = performance.now();
            let messages = result.toString({ format: options.format ?? "rich" });
            tree = getTree(messages).toString();
            tree = format(tree);
            logUpdate(tree);
        }
...

I also had to comment out this line in the same file because I was getting a "console.log is not a function" error.

//import console from "./console.js";

Changing from 16ms to 100ms made things slightly faster but less smooth.

lloydk commented 5 months ago

Keep in mind that my PC is around 10 years old so those are probably close to worst case results

LeaVerou commented 5 months ago

Good catch, throttling on the hTest side would probably be a good thing regardless. But even if perf was not an issue, I'm not sure having 8K tests testing a single color space is a good thing.

Note that the test times reported by htest are misleading and don't account for any overhead added by htest.

That is intentional, since you typically care more about how long your code takes to run rather than the test runner. They just don 't help in this case 😁

facelessuser commented 5 months ago

I'm not sure having 8K tests testing a single color space is a good thing.

I generally agree. I think it is a bit excessive. I assume it was done for the same reason I did, you are implementing a space, it has a test suite, so you verify against the test suite. If overhead is not a problem, it is completed fairly quickly.

But, I think at this point, we've verified the spaces, we can probably pick some data points at random as a sanity check, and call that sufficient maybe.

lloydk commented 5 months ago

The conversion tests already contain a few tests for HSLuv and HPLuv so I'll just move the comprehensive test suite back to the hsluv directory.

color-js / color.js

Tests for HSLuv/HPLuv clogging up the test runner #487