Cmake labels (take-over #2690)

LecrisUT commented 3 months ago

Continuing where @uyha left off

LecrisUT commented 3 months ago

Replying to the comments in the other PR. @horenmar, when you have the time:

The changes here introduce hard dependency on relatively new CMake versions (3.19 for JSON support, potentially more around the label handling), so in the end the new script should live side by side with the old one for some time.

What is the procedure of the minimal CMake requirement? My suggestion is to support the latest distro LTS. Are there stricter packaging/usage requirements otherwise?

Also the add_command(set ${_TEST_LIST} ${tests}) around line 183 is currently broken, this is the result in the script file:

The file file with the semicolon in name is split into two elements in the list, which it is not supposed to be.

Not sure what is meant there.

The iterative handling of JSON in CMake is slooooow

Fair point, not sure how to improve on this. My intuition is to use string(JSON REMOVE) and only query for the 0th item

horenmar commented 3 months ago

Re CMake JSON handling: CMake JSON functions can take lists of arguments, so e.g. string(JSON tags GET ${json_tags} 0 1 2 3 4) returns list of parsed tags from json_tags on indices 0, 1, ..., 4 in tags. This provides much better performance, because it avoids a lot of the JSON reparsing issue.

As I mentioned before, this is not needed on tags, as there won't be many of them, but for test cases this is likely to make significant difference. And for tags it should be actually easy, unlike for test cases, so it might as well be done there. Note that trying the REMOVE the used elements is likely to be worse, as it forces another JSON reparse per element.

For context, my current job has over 1k tests for the main project. Imagine reparsing the whole JSON with 1k tests 1k times.

Re add_command(set ${_TEST_LIST} ${tests}): The test list variable is supposed to contain a CMake list, of the CMake/CTest names of all added (discovered) tests. Instead it contains the JSON output from Catch2, and then appends the test names badly. -- This seems to be fixed in your last commit.

[==[@Script[C:\EPM1A]=x]==] [==["SCALA_ZERO:"]==] is three elements in the list, but the original test name is @Script[C:\\EPM1A]=x;"SCALA_ZERO:" and should be appended as one. See tests/TestScripts/DiscoverTests/register-tests.cpp and friend. -- I haven't checked whether this is fixed in the last commit.

tests/TestScripts/DiscoverTests/ is also where you can hang on more tests for the discovery output.

Re CMake minimum version: No procedure, just vibes. Outside of these helper scripts, Catch2's CMake does not need any advanced features from CMake, so it does not need to make decisions on whether to bump min CMake version or not.

For the helper scripts, a survey of currently supported LTS versions of common Linux distros would be a nice starting point. However I suspect that the result is that CMake <3.19 is still common, so there are two options that I see here.

1) Say that newer CMake is easy-enough to install from kitware's repos, and rely on new-enough versions of CMake. 2) Maintain both new and old registration and pick based on the current CMake version.

I don't want to just tell people that they have to keep around the scripts from the older releases, because the newer ones won't work for them, so I think the second option is preferable. Especially since with some work, the discovery methods can be encapsulated into simple functions -> it takes the binary, arguments and then returns list of tests+tags+other relevant data. The old discovery method will use the same approximate parsing it does now, new discovery code will use JSON for perfect parsing, but the actual registration code that generates the CTest scripts can be shared.

LecrisUT commented 3 months ago

I don't want to just tell people that they have to keep around the scripts from the older releases, because the newer ones won't work for them, so I think the second option is preferable. Especially since with some work, the discovery methods can be encapsulated into simple functions -> it takes the binary, arguments and then returns list of tests+tags+other relevant data. The old discovery method will use the same approximate parsing it does now, new discovery code will use JSON for perfect parsing, but the actual registration code that generates the CTest scripts can be shared.

Yeah, I think I can clean that up a bit and make it more version dependent. I'll try to look into it a bit later in the week. If you have the time to test any of the recent changes, that will help things there.

CMake JSON functions can take lists of arguments, so e.g. string(JSON tags GET ${json_tags} 0 1 2 3 4) returns list of parsed tags from json_tags on indices 0, 1, ..., 4 in tags. This provides much better performance, because it avoids a lot of the JSON reparsing issue.

Interesting I will look into it. From how I read the documentation, it seemed to populate as a JSON list, which would defeat the purpose there.

My intuition about REMOVE was that it would stop parsing at the first element and then copy the rest. But internally CMake uses nlohmann::json, so probably that would not help

About why I choose LTS version minimum, it is because for the user, they should do pip install cmake ninja. The only environments that are affected by CMake version are packagers, and I try to strike a balance on availability and maintainability. But this is a project-by-project preference, and I will adapt to whichever preference there is for this point.

horenmar commented 3 months ago

Interesting I will look into it. From how I read the documentation, it seemed to populate as a JSON list, which would defeat the purpose there.

Oh, I misunderstood how it works as well, and the actual answer is kinda stupid.

set(INDICES "0" "0")
string(JSON tags GET "[[10, 12], 1, 2, 3]" ${INDICES})
message(STATUS ${tags})

cmake -P json-test.cmake
-- 10

so the list is actually nested indices/keys. That's... not great. But also it means that all that can be skipped for now.

LecrisUT commented 3 months ago

Actually maybe using those will speed up the script part if it means CMake does not need to copy so much string back-and-forth and it can all be handled on the nlohmann::json level. It will at least make things more readable. Thanks for letting me know about that.

Bidski commented 1 month ago

Are there any updates on this?

I made a quick CMake reporter to generate CMake-style lists and then parse that output in CMake to add the tags as CTest labels. It seems to work reasonably well and does not appear to be too slow. Is this an acceptable implementation or would you still prefer to see this JSON version work?

LecrisUT commented 1 month ago

Interesting, can you elaborate on how that works? I think having a CMake native format will be preferred, and the main thing we are worried here is the performance of CMake's json parsing

Bidski commented 1 month ago

For the DiscoverTests test in ./tests/TestScripts/DiscoverTests this is the output we would get

$ ./debug-build/tests/ctest-registration-test/tests --reporter cmake --list-tests
@Script[C:\EPM1A]=x\\;"SCALA_ZERO:"\;\;/home/abiddulph/Projects/Catch2/tests/TestScripts/DiscoverTests/register-tests.cpp\;11\;script regressions;Some test\;\;/home/abiddulph/Projects/Catch2/tests/TestScripts/DiscoverTests/register-tests.cpp\;12\;;Let's have a test case with a long name. Longer. No, even longer. Really looooooooooooong. Even longer than that. Multiple lines worth of test name. Yep, like this.\;\;/home/abiddulph/Projects/Catch2/tests/TestScripts/DiscoverTests/register-tests.cpp\;13\;;And now a test case with weird tags.\;\;/home/abiddulph/Projects/Catch2/tests/TestScripts/DiscoverTests/register-tests.cpp\;16\;foo,bar,tl\\;dr,tl\\;dw

It is structured, effectively, as a list of lists. The first element of the outer list (in this case) is @Script[C:\EPM1A]=x\\;"SCALA_ZERO:"\;\;/home/abiddulph/Projects/Catch2/tests/TestScripts/DiscoverTests/register-tests.cpp\;11\;script regressions which has

the test name @Script[C:\EPM1A]=x\\;"SCALA_ZERO:",
the class name `` (empty for this test case),
the file name /path/to/Catch2/tests/TestScripts/DiscoverTests/register-tests.cpp,
the line number 11, and
the tags as comma separated values script regressions

If there are no tags for a particular test case then the tags will be empty as well.

You can then just foreach over the output of --list-tests to get each test case and then extract out the pertinent elements (0 and 4 for this) and parse as needed.

LecrisUT commented 1 month ago

It looks like it could work more efficiently until CMake provides more efficient Json parsing. The only comment is about manually escaping vs [==[ literals. I think we want te later, at least for name. Can you open a PR for that approach?

catchorg / Catch2

Cmake labels (take-over #2690) #2832