bats-core / bats-support

Supporting library for Bats test helpers
Creative Commons Zero v1.0 Universal
41 stars 18 forks source link

How Should `batslib_count_lines` Handle Trailing Newlines? #11

Open CauhxMilloy opened 4 months ago

CauhxMilloy commented 4 months ago

Currently, batslib_count_lines is written to try and ignore the last newline to be equivalent to it not being there.

This leads '' and '\n' as both being "0 line". Similarly, 'a' and 'a\n' will both be considered as "1 line". This is explicitly checked in tests.

Issue

However, when using run --keep-empty-lines, assertions will check trailing newlines. However, this leads batslib_is_single_line to still return 0, and the error messages will be printed by batslib_print_kv_single instead of batslib_print_kv_multi (which shows the line numbers, a helpful indicator for debugging). Example (using bats-assert):

@test "trailing empty line" {
  run --keep-empty-lines printf 'a\n'
  assert_output "$(printf 'a')"
}

prints:

-- output differs --
expected : a
actual   : a

--

The new line is there, but it's hard to really notice. More egregiously, a multi-line output will be very misleading. Example:

@test "trailing empty line on multiline" {
  run --keep-empty-lines printf 'a\nb\n'
  assert_output "$(printf 'a\nb')"
}

prints:

-- output differs --
expected (2 lines):
a
b
actual (2 lines):
a
b
--  

This both hides the extra new line and states that the line counts are the same.

Discussion

I'm writing out this GH Issue (and not just sending some PR(s)) because there's a few ways that this could possibly be addressed. I wanted to chat this over and see which way you'd think would be best. I'm happy to put together a PR for wherever we land.

Here's some options that I've considered, feel free to suggest more:

1. Update the impl for batslib_count_lines.

This could be done by changing

batslib_count_lines() {
  local -i n_lines=0
  local line
  while IFS='' read -r line || [[ -n $line ]]; do
    (( ++n_lines ))
  done < <(printf '%s' "$1")
  echo "$n_lines"
}

to

batslib_count_lines() {
  local -r input="$1"
  local -i n_lines=0
  local line
  if [ -n "$input" ]; then
    while IFS='' read -r line; do
      (( ++n_lines ))
    done < <(printf '%s' "$input")
    # Increment for the EOF (not captured by the loop above).
    (( ++n_lines ))
  fi
  echo "$n_lines"
}

There are some related extra changes that are necessary with this approach. batslib_prefix needs to be updated in a similar way (else batslib_print_kv_multi in batslib_print_kv_single_or_multi will print the incorrect number of lines). Potential impl change from

batslib_prefix() {
  local -r prefix="${1:-  }"
  local line
  while IFS='' read -r line || [[ -n $line ]]; do
    printf '%s%s\n' "$prefix" "$line"
  done
}

to

batslib_prefix() {
  local -r prefix="${1:-  }"
  local line=''
  local -i lines_read=0
  while IFS='' read -r line; do
    printf '%s%s\n' "$prefix" "$line"
    (( ++lines_read ))
  done
  # Print for the line with EOF (not captured by the loop above).
  # Do not print if stdin was completely empty.
  if (( lines_read > 0 )) || [[ -n "$line" ]]; then
    printf '%s%s\n' "$prefix" "$line"
  fi
}

batslib_mark would likely also need to be updated in a similar way (these 3 functions all use while IFS='' read -r line || [[ -n $line ]]; do).

While I think this is pretty straightforward (and my personal preference), I understand that this may have backwards compatibility issues and may have implications for other functionality built upon this function (which may really care about ignoring trailing newlines).

I can see that bats-assert makes use of these 3 functions in refute_line. The suggested changes to batslib_prefix and batslib_mark wouldn't really change anything. Though the indirect change to batslib_is_single_line (via the change to batslib_count_lines) might lead to some more calls being made to batslib_print_kv_multi instead of batslib_print_kv_single. I'd suspect that wouldn't affect most cases, as the trailing newline is removed unless explicitly forced to be kept (which is likely desirable).

2. Add --keep-empty-lines (or similar) to batslib_count_lines.

This option keeps the old functionality and allows the caller to change into tracking all newlines. This could either be done as opt-in with --keep-empty-lines or --include-trailing-newline; or it could be opt-out, where the default is to include it and --ignore-trailing-newline is added.

This option (specifically the opt-in version) is more backwards compatible with existing logic. However, it does require adding/piping the flag through various functions, which can be a bit verbose. Concerns about differentiating --keep... from intended input can be worked-around via there should always be an even number of positional arguments to these specific functions.

3. Create alternative batslib_count_lines function.

In this option, a new function (e.g. batslib_count_all_lines) can be added to use the new logic. This is very backwards compatible, as it does not touch the existing batslib_count_lines function. However, it would require doing the same for other functions like batslib_is_single_line, batslib_print_kv_multi, batslib_print_kv_single_or_multi, etc.


Those are the suggestions that I have thus far. As I mentioned, I think option 1 is the best option. Let me know what you think.

Thanks!

P.S. There's a psudo-related TODO(ztombol) on batslib_count_lines about fixing inconsistencies vs ${#lines[@]}. This isn't really the same issue, but it's somewhat related. Thought I'd mention it.

CauhxMilloy commented 4 months ago

As another alternative option:

4. Auto-detect which logic to use.

batslib_count_lines could potentially auto-detect whether to keep empty lines or not based on the input string. If the input variable has trailing newline(s), then they must have been manually kept (via something like --keep-empty-lines). This could be checked by something like [[ "${input: -1}" = $'\n' ]]. If there is a newline, then use the logic given in Option 1. If there is no newline, use the existing logic. I think this might fix the issue and be backwards compatible.

While this option offers a more seamless approach for batslib_count_lines, there still remains the issue of batslib_prefix (which affects batslib_count_lines in batslib_print_kv_single_or_multi). To address this issue, either Option 1 (always print trailing new line if non-empty string) or Option 2 (add a --keep-empty-lines option) would be a fine choice to update batslib_prefix. Option 2 is likely better for this scenario, as batslib_print_kv_single_or_multi can optionally pass it for each value based on the same auto-detect logic ([[ "${pairs[$i]: -1}" = $'\n' ]]). Again, batslib_mark should be updated in the same way as batslib_prefix.