Closed PatTheMav closed 5 months ago
As a workaround I'm now storing the subset of changed CMake files in a diff in a separate array and then pass that directly to gersemi
with the --diff
argument. This is arguably faster as it avoids the overhead of running a new python process for every file, but I think it would still be beneficial if the behaviour is similar to other formatters.
Unfortunately as far as I can tell gersemi does produce the output when not used in conjuction with -i
/--in-place
or -c
/--check
. This is my reproduction script:
#!/bin/bash
function log() {
echo "[reproduction]" $@
}
log "Create CMake file"
cat <<EOF > /tmp/gersemi-29-repro-good.cmake
message(STATUS foo)
if(TRUE)
message(STATUS foo)
endif()
EOF
cat <<EOF > /tmp/gersemi-29-repro-bad.cmake
message(STATUS foo )
if(TRUE)
message(STATUS foo )
endif()
EOF
log "Print files"
log "-- good"
cat /tmp/gersemi-29-repro-good.cmake
log "-- bad"
cat /tmp/gersemi-29-repro-bad.cmake
log "Run gersemi in stdout mode"
log "-- good"
gersemi /tmp/gersemi-29-repro-good.cmake
log "-- bad"
gersemi /tmp/gersemi-29-repro-bad.cmake
log "Pipe cat stdout to count lines on good file"
cat /tmp/gersemi-29-repro-good.cmake | wc -l
log "Pipe gersemi stdout to count lines on good file"
gersemi /tmp/gersemi-29-repro-good.cmake | wc -l
log "Run command as in report on good file"
FILE=/tmp/gersemi-29-repro-good.cmake
gersemi "${FILE}" | { diff -u "${FILE}" - || :; } | sed -e "1s|--- |--- a/|" -e "2s|+++ -|+++ b/${FILE}|"
log "Run command as in report on bad file"
FILE=/tmp/gersemi-29-repro-bad.cmake
gersemi "${FILE}" | { diff -u "${FILE}" - || :; } | sed -e "1s|--- |--- a/|" -e "2s|+++ -|+++ b/${FILE}|"
I see that your command didn't produce anything but I will also admit that I'm not that familiar with shell syntax and I don't know what happens in some-command | { what-happens-here } | some-other command
so if it's anyway related to this construct I'll look into it on the weekend. If there's a bug my intention is to have that "stdout" mode working.
It's quite simple to compare - if you run clang-format
or swift-format
on a file that is already properly formatted it will just print the file to stdout as-is (i.e., clang-format <some_file>
. No additional arguments needed.
If you do the same with gersemi
it just prints nothing.
Here's the contents of a properly formatted CMakeLists.txt
, that - when invoked via gersemi CMakeLists.txt
prints nothing:
cmake_minimum_required(VERSION 3.24...3.28)
foreach(policy_num RANGE 0 151 1)
string(REGEX REPLACE "^.*(....)\$" "\\1" policy_num "0000${policy_num}")
set(policy_id "CMP${policy_num}")
cmake_policy(GET ${policy_id} policy_result)
message(STATUS "Value for ${policy_id}: ${policy_result}")
endforeach()
project(cmake-policy-test VERSION 0.0.1)
return()
I see that your command didn't produce anything but I will also admit that I'm not that familiar with shell syntax and I don't know what happens in
some-command | { what-happens-here } | some-other command
so if it's anyway related to this construct I'll look into it on the weekend. If there's a bug my intention is to have that "stdout" mode working.
The { diff -u "${FILE}" - || :; }
construct just ensures that if the diff
command fails that no bad input is piped into sed
after - as it is part of a conditional expression, the right side of the expression is then evaluated and as :
is a "no-op" in Bash/Zsh it will always yield a zero return code and thus the command group enclosed in curly braces will always have a zero return code.
(The semicolon is necessary to turn the closing curly brace into the first token in a new command, otherwise it'd not be recognised by the parser as the end of the command group).
So sed
either gets an empty string and does nothing - or it gets the expected diff string for which its matching expressions work.
It's quite simple to compare - if you run
clang-format
orswift-format
on a file that is already properly formatted it will just print the file to stdout as-is (i.e.,clang-format <some_file>
. No additional arguments needed.If you do the same with
gersemi
it just prints nothing.Here's the contents of a properly formatted
CMakeLists.txt
, that - when invoked viagersemi CMakeLists.txt
prints nothing:cmake_minimum_required(VERSION 3.24...3.28) foreach(policy_num RANGE 0 151 1) string(REGEX REPLACE "^.*(....)\$" "\\1" policy_num "0000${policy_num}") set(policy_id "CMP${policy_num}") cmake_policy(GET ${policy_id} policy_result) message(STATUS "Value for ${policy_id}: ${policy_result}") endforeach() project(cmake-policy-test VERSION 0.0.1) return()
I'm aware of clang-format behaviour and I tried to model that myself. I'm afraid we're missing some detail.
In my tests I was only able to get it to print the file after exporting the default settings to .gersemirc
next to the CMakeLists.txt
.
Seems to be that quiet
is implicitly true
by default and thus no output happens?
quiet
is False by default. Perhaps you have .gersemirc
in any of the directory above directory of formatted file that interferes with your test.
Forgot to mention, quiet
is only meant to impact stderr. Stdout is still present in my setup:
The only other .gersemirc
on my system is in an unrelated directory tree. I also created a new terminal session and it did print the contents for the first few attempts, it then just stopped doing so.
Also when running with --quiet
it did print the contents, when running without it, nothing was printed again:
It sounds like some really obscure thing. If you managed to demonstrate some script to reproduce that, even occasionally, that'd be extremely helpful. It can involve pre-commit hook if it's easier to reproduce that way. Consider doing something like I've tried here where I prepared dummy repository.
See above, any kind of invocation can exhibit the issue.
What I've found is that running touch
on the file makes gersemi
print it again. When running gersemi --check
on it, it stops (so first run with --check
, then without - the second time, nothing is printed).
The last comment was the bull's eye, indeed after one invocation of --check
it stops printing.
0.13.2 got released with the fix. Thanks for this report and help narrowing down the issue.
@BlankSpruce interestingly this also fixed another bug I encountered:
CI complained about files that weren't correctly formatted but whenever I ran gersemi --check
locally, it never complained.
With the bugfix they are now finally marked as requiring formatting changes locally as well.
For your information I'm caching results of invocation with -c
or -i
in <wherever-is-cache-directory-in-your-OS>/gersemi/<version>/cache.db
to avoid rechecking already formatted files. I'm guessing your CI pipeline has clean cache directory on each run and it was the reason for that behaviour you observed.
Alas it seems I'm running into issues surrounding the caching again:
gersemi
emits warnings when custom functions are used (which breaks our tooling because we parse gersemi
's output to check for files that require actual formatting changes, so now also correctly formatted files are warned about, but that's another discussion)gersemi
only prints the warning for uncached files.IMO there really needs to be a switch to force gersemi
to either clean its cache or to force-check, because this makes gersemi
's behaviour non-deterministic and allows files to pass pre-commit checks that then fail checks on CI (this has happened multiple times now which is why the update to gersemi
hasn't been merged in our project yet).
(which breaks our tooling because we parse gersemi's output to check for files that require actual formatting changes, so now also correctly formatted files are warned about, but that's another discussion)
If you are parsing the output it seems that looking for would be reformatted
string should be enough and you could easily discard any other warning. If you need something more specific (like some output format) then consider opening issue with details.
This seems to only happen with uncached files though - in our case the custom function is used by all our subproject, but gersemi only prints the warning for uncached files.
I think I understand where's the issue on my part. I didn't consider file with missing definition warning as one that's "not formatted". I will probably fix that by not caching such result.
IMO there really needs to be a switch to force gersemi to either clean its cache or to force-check, because this makes gersemi's behaviour non-deterministic and allows files to pass pre-commit checks that then fail checks on CI (this has happened multiple times now which is why the update to gersemi hasn't been merged in our project yet).
Reasonable request. I will probably introduce --disable-cache
or something along these lines as an additional defense mechanism but I'd discourage using that locally because the whole point is to have formatting as cheap as possible. I'm not saying this is the desired strategy but having some CI pipelines in the wild catching those kind of bugs make it possible for me to even be aware of that since the userbase is too small to reliably and early report these bugs.
I will probably deal with these problems on Saturday but no hard promises.
I think in this specific case it is CI that behaves correctly (every runner is "fresh" and thus has no cache) so it catches warnings at all times.
It is the local check that will confuse users as they will obviously just re-run gersemi
to format the offending file, but gersemi
won't do anything (because it was the check that failed, not the formatting) and checking the file won't yield any output because of caching.
As I said on the issue I just opened, I'm not sure whether introducing additional conditions that constitute a "failed" check is good for a format checker. That's "linter" territory (the old cmake-format
had a separate cmake-lint
program for that) and it conflates these different concerns.
About CI this is exactly what I meant. I'd see the use of such --disable-cache
flag in CI pipelines but only there.
As I said on the issue I just opened, I'm not sure whether introducing additional conditions that constitute a "failed" check is good for a format checker. That's "linter" territory (the old cmake-format had a separate cmake-lint program for that) and it conflates these different concerns.
I see some merit in your point but I'd argue that if formatter doesn't "understand" some custom commands it can't be definitively said that file is formatted. Before 0.14.0 I was treading on a territory "there are multiple ways for file to be considered formatted" but now I intend to converge on "there should be only one way* for file to be formatted".
* - within project context and with concrete settings like line_length
About CI this is exactly what I meant. I'd see the use of such --disable-cache flag in CI pipelines but only there.
PS. And maybe as a mean to debug the issue.
About CI this is exactly what I meant. I'd see the use of such
--disable-cache
flag in CI pipelines but only there.As I said on the issue I just opened, I'm not sure whether introducing additional conditions that constitute a "failed" check is good for a format checker. That's "linter" territory (the old cmake-format had a separate cmake-lint program for that) and it conflates these different concerns.
I see some merit in your point but I'd argue that if formatter doesn't "understand" some custom commands it can't be definitively said that file is formatted. Before 0.14.0 I was treading on a territory "there are multiple ways for file to be considered formatted" but now I intend to converge on "there should be only one way* for file to be formatted".
- within project context and with concrete settings like
line_length
Yeah I expected that to be the reasoning behind the change, though I would argue it is fine if
CMake is a peculiar beast here because of the way its function signatures are designed, which is made explicit by cmake_parse_arguments
and requires user input to "describe" a function's arguments. I dunno whether coming up with a bespoke "argument specification" for the config file is better than just providing CMake files that contain these specifications via the definitions
setting (see my other issue about that).
Again, this is how cmake-format
used to work - it applied default argument formatting for unknown functions and required the user to actively describe custom functions to make it format it the way users desired. It was deterministic and stable behaviour.
I've stumbled over a little problem with
gersemi
in comparison toclang-format
,swift-format
, and other formatters:Those formatters always output the input file, representing the "formatted" variant even if no formatting changes where necessary.
gersemi
instead outputs nothing.Context: I have a pre-commit hook that iterates over all changed files matching suffixes I have formatters for and effectively runs
gersemi "${FILE}" | { diff -u "${FILE}" - || :; } | sed -e "1s|--- |--- a/|" -e "2s|+++ -|+++ b/${FILE}|"
to generate a patch file with the changes.Because
gersemi
does not output anything on a file that is correctly formatted, thestdin
for thediff
command is empty and the generated patch effectively deletes the entire file content.IMO it would be beneficial if
gersemi
would behave like other code-formatters in this case.