Gersemi does not provide formatted output when no changes were made

PatTheMav commented 5 months ago

I've stumbled over a little problem with gersemi in comparison to clang-format, swift-format, and other formatters:

Those formatters always output the input file, representing the "formatted" variant even if no formatting changes where necessary. gersemi instead outputs nothing.

Context: I have a pre-commit hook that iterates over all changed files matching suffixes I have formatters for and effectively runs gersemi "${FILE}" | { diff -u "${FILE}" - || :; } | sed -e "1s|--- |--- a/|" -e "2s|+++ -|+++ b/${FILE}|" to generate a patch file with the changes.

Because gersemi does not output anything on a file that is correctly formatted, the stdin for the diff command is empty and the generated patch effectively deletes the entire file content.

IMO it would be beneficial if gersemi would behave like other code-formatters in this case.

PatTheMav commented 5 months ago

As a workaround I'm now storing the subset of changed CMake files in a diff in a separate array and then pass that directly to gersemi with the --diff argument. This is arguably faster as it avoids the overhead of running a new python process for every file, but I think it would still be beneficial if the behaviour is similar to other formatters.

BlankSpruce commented 5 months ago

Unfortunately as far as I can tell gersemi does produce the output when not used in conjuction with -i/--in-place or -c/--check. This is my reproduction script:

#!/bin/bash

function log() {
    echo "[reproduction]" $@
}

log "Create CMake file"
cat <<EOF > /tmp/gersemi-29-repro-good.cmake
message(STATUS foo)

if(TRUE)
    message(STATUS foo)
endif()
EOF

cat <<EOF > /tmp/gersemi-29-repro-bad.cmake
message(STATUS foo   )

if(TRUE)
    message(STATUS foo    )
endif()
EOF

log "Print files"
log "-- good"
cat /tmp/gersemi-29-repro-good.cmake
log "-- bad"
cat /tmp/gersemi-29-repro-bad.cmake

log "Run gersemi in stdout mode"
log "-- good"
gersemi /tmp/gersemi-29-repro-good.cmake
log "-- bad"
gersemi /tmp/gersemi-29-repro-bad.cmake

log "Pipe cat stdout to count lines on good file"
cat /tmp/gersemi-29-repro-good.cmake | wc -l

log "Pipe gersemi stdout to count lines on good file"
gersemi /tmp/gersemi-29-repro-good.cmake | wc -l

log "Run command as in report on good file"
FILE=/tmp/gersemi-29-repro-good.cmake
gersemi "${FILE}" | { diff -u "${FILE}" - || :; } | sed -e "1s|--- |--- a/|" -e "2s|+++ -|+++ b/${FILE}|"

log "Run command as in report on bad file"
FILE=/tmp/gersemi-29-repro-bad.cmake
gersemi "${FILE}" | { diff -u "${FILE}" - || :; } | sed -e "1s|--- |--- a/|" -e "2s|+++ -|+++ b/${FILE}|"

I see that your command didn't produce anything but I will also admit that I'm not that familiar with shell syntax and I don't know what happens in some-command | { what-happens-here } | some-other command so if it's anyway related to this construct I'll look into it on the weekend. If there's a bug my intention is to have that "stdout" mode working.

PatTheMav commented 5 months ago

It's quite simple to compare - if you run clang-format or swift-format on a file that is already properly formatted it will just print the file to stdout as-is (i.e., clang-format <some_file>. No additional arguments needed.

If you do the same with gersemi it just prints nothing.

Here's the contents of a properly formatted CMakeLists.txt, that - when invoked via gersemi CMakeLists.txt prints nothing:

cmake_minimum_required(VERSION 3.24...3.28)

foreach(policy_num RANGE 0 151 1)
    string(REGEX REPLACE "^.*(....)\$" "\\1" policy_num "0000${policy_num}")
    set(policy_id "CMP${policy_num}")
    cmake_policy(GET ${policy_id} policy_result)

    message(STATUS "Value for ${policy_id}: ${policy_result}")
endforeach()

project(cmake-policy-test VERSION 0.0.1)

return()

PatTheMav commented 5 months ago

I see that your command didn't produce anything but I will also admit that I'm not that familiar with shell syntax and I don't know what happens in some-command | { what-happens-here } | some-other command so if it's anyway related to this construct I'll look into it on the weekend. If there's a bug my intention is to have that "stdout" mode working.

The { diff -u "${FILE}" - || :; } construct just ensures that if the diff command fails that no bad input is piped into sed after - as it is part of a conditional expression, the right side of the expression is then evaluated and as : is a "no-op" in Bash/Zsh it will always yield a zero return code and thus the command group enclosed in curly braces will always have a zero return code.

(The semicolon is necessary to turn the closing curly brace into the first token in a new command, otherwise it'd not be recognised by the parser as the end of the command group).

So sed either gets an empty string and does nothing - or it gets the expected diff string for which its matching expressions work.

BlankSpruce commented 5 months ago

It's quite simple to compare - if you run clang-format or swift-format on a file that is already properly formatted it will just print the file to stdout as-is (i.e., clang-format <some_file>. No additional arguments needed.

If you do the same with gersemi it just prints nothing.

Here's the contents of a properly formatted CMakeLists.txt, that - when invoked via gersemi CMakeLists.txt prints nothing:
cmake_minimum_required(VERSION 3.24...3.28)

foreach(policy_num RANGE 0 151 1)
    string(REGEX REPLACE "^.*(....)\$" "\\1" policy_num "0000${policy_num}")
    set(policy_id "CMP${policy_num}")
    cmake_policy(GET ${policy_id} policy_result)

    message(STATUS "Value for ${policy_id}: ${policy_result}")
endforeach()

project(cmake-policy-test VERSION 0.0.1)

return()

I'm aware of clang-format behaviour and I tried to model that myself. I'm afraid we're missing some detail.

PatTheMav commented 5 months ago

In my tests I was only able to get it to print the file after exporting the default settings to .gersemirc next to the CMakeLists.txt.

Seems to be that quiet is implicitly true by default and thus no output happens?

BlankSpruce commented 5 months ago

quiet is False by default. Perhaps you have .gersemirc in any of the directory above directory of formatted file that interferes with your test.

BlankSpruce commented 5 months ago

Forgot to mention, quiet is only meant to impact stderr. Stdout is still present in my setup:

PatTheMav commented 5 months ago

The only other .gersemirc on my system is in an unrelated directory tree. I also created a new terminal session and it did print the contents for the first few attempts, it then just stopped doing so.

Also when running with --quiet it did print the contents, when running without it, nothing was printed again:

BlankSpruce commented 5 months ago

It sounds like some really obscure thing. If you managed to demonstrate some script to reproduce that, even occasionally, that'd be extremely helpful. It can involve pre-commit hook if it's easier to reproduce that way. Consider doing something like I've tried here where I prepared dummy repository.

PatTheMav commented 5 months ago

See above, any kind of invocation can exhibit the issue.

What I've found is that running touch on the file makes gersemi print it again. When running gersemi --check on it, it stops (so first run with --check, then without - the second time, nothing is printed).

BlankSpruce commented 5 months ago

The last comment was the bull's eye, indeed after one invocation of --check it stops printing.

BlankSpruce commented 5 months ago

0.13.2 got released with the fix. Thanks for this report and help narrowing down the issue.

PatTheMav commented 5 months ago

@BlankSpruce interestingly this also fixed another bug I encountered:

CI complained about files that weren't correctly formatted but whenever I ran gersemi --check locally, it never complained.

With the bugfix they are now finally marked as requiring formatting changes locally as well.

BlankSpruce commented 5 months ago

For your information I'm caching results of invocation with -c or -i in <wherever-is-cache-directory-in-your-OS>/gersemi/<version>/cache.db to avoid rechecking already formatted files. I'm guessing your CI pipeline has clean cache directory on each run and it was the reason for that behaviour you observed.

PatTheMav commented 4 months ago

Alas it seems I'm running into issues surrounding the caching again:

With version 0.14.0 gersemi emits warnings when custom functions are used (which breaks our tooling because we parse gersemi's output to check for files that require actual formatting changes, so now also correctly formatted files are warned about, but that's another discussion)
This seems to only happen with uncached files though - in our case the custom function is used by all our subproject, but gersemi only prints the warning for uncached files.

IMO there really needs to be a switch to force gersemi to either clean its cache or to force-check, because this makes gersemi's behaviour non-deterministic and allows files to pass pre-commit checks that then fail checks on CI (this has happened multiple times now which is why the update to gersemi hasn't been merged in our project yet).

BlankSpruce commented 4 months ago

(which breaks our tooling because we parse gersemi's output to check for files that require actual formatting changes, so now also correctly formatted files are warned about, but that's another discussion)

If you are parsing the output it seems that looking for would be reformatted string should be enough and you could easily discard any other warning. If you need something more specific (like some output format) then consider opening issue with details.

This seems to only happen with uncached files though - in our case the custom function is used by all our subproject, but gersemi only prints the warning for uncached files.

I think I understand where's the issue on my part. I didn't consider file with missing definition warning as one that's "not formatted". I will probably fix that by not caching such result.

IMO there really needs to be a switch to force gersemi to either clean its cache or to force-check, because this makes gersemi's behaviour non-deterministic and allows files to pass pre-commit checks that then fail checks on CI (this has happened multiple times now which is why the update to gersemi hasn't been merged in our project yet).

Reasonable request. I will probably introduce --disable-cache or something along these lines as an additional defense mechanism but I'd discourage using that locally because the whole point is to have formatting as cheap as possible. I'm not saying this is the desired strategy but having some CI pipelines in the wild catching those kind of bugs make it possible for me to even be aware of that since the userbase is too small to reliably and early report these bugs.

I will probably deal with these problems on Saturday but no hard promises.

PatTheMav commented 4 months ago

I think in this specific case it is CI that behaves correctly (every runner is "fresh" and thus has no cache) so it catches warnings at all times.

It is the local check that will confuse users as they will obviously just re-run gersemi to format the offending file, but gersemi won't do anything (because it was the check that failed, not the formatting) and checking the file won't yield any output because of caching.

As I said on the issue I just opened, I'm not sure whether introducing additional conditions that constitute a "failed" check is good for a format checker. That's "linter" territory (the old cmake-format had a separate cmake-lint program for that) and it conflates these different concerns.

BlankSpruce commented 4 months ago

About CI this is exactly what I meant. I'd see the use of such --disable-cache flag in CI pipelines but only there.

As I said on the issue I just opened, I'm not sure whether introducing additional conditions that constitute a "failed" check is good for a format checker. That's "linter" territory (the old cmake-format had a separate cmake-lint program for that) and it conflates these different concerns.

I see some merit in your point but I'd argue that if formatter doesn't "understand" some custom commands it can't be definitively said that file is formatted. Before 0.14.0 I was treading on a territory "there are multiple ways for file to be considered formatted" but now I intend to converge on "there should be only one way* for file to be formatted".

* - within project context and with concrete settings like line_length

BlankSpruce commented 4 months ago

About CI this is exactly what I meant. I'd see the use of such --disable-cache flag in CI pipelines but only there.

PS. And maybe as a mean to debug the issue.

PatTheMav commented 4 months ago

About CI this is exactly what I meant. I'd see the use of such --disable-cache flag in CI pipelines but only there.

As I said on the issue I just opened, I'm not sure whether introducing additional conditions that constitute a "failed" check is good for a format checker. That's "linter" territory (the old cmake-format had a separate cmake-lint program for that) and it conflates these different concerns.

I see some merit in your point but I'd argue that if formatter doesn't "understand" some custom commands it can't be definitively said that file is formatted. Before 0.14.0 I was treading on a territory "there are multiple ways for file to be considered formatted" but now I intend to converge on "there should be only one way* for file to be formatted".

within project context and with concrete settings like line_length

Yeah I expected that to be the reasoning behind the change, though I would argue it is fine if

Gersemi applies one default formatting for unknown functions
Gersemi applies custom formatting for known functions only
Gersemi allows users to provide "function argument specifications" for custom functions

CMake is a peculiar beast here because of the way its function signatures are designed, which is made explicit by cmake_parse_arguments and requires user input to "describe" a function's arguments. I dunno whether coming up with a bespoke "argument specification" for the config file is better than just providing CMake files that contain these specifications via the definitions setting (see my other issue about that).

Again, this is how cmake-format used to work - it applied default argument formatting for unknown functions and required the user to actively describe custom functions to make it format it the way users desired. It was deterministic and stable behaviour.

BlankSpruce / gersemi

Gersemi does not provide formatted output when no changes were made #29