GrayJack / coreutils

Core utils re-implementation for UNIX/UNIX-like systems written in Rust
Mozilla Public License 2.0
106 stars 40 forks source link

Uniq: fix -s and implement -f flags #131

Closed marcospb19 closed 3 years ago

marcospb19 commented 3 years ago

There are some decisions I have to ask you first before completing and asking for a complete review:

  1. It seems that other implementations of uniq ignore the trailing '\n' at the end of the lines, but does not ignore '\r', what is the desired behavior for me to implement? (I know that this project is only meant to be used in Windows, but \r\n files are still possible).
  2. In Rust we usually deal with UTF-8 encoded text, it happens that the -s flag, commonly named --skip-chars, does not skip UTF-8 chars, but bytes instead (other implementations), see this example:
uniq -s 1
é verdade
ó verdade
a verdade

Uniq will display the 3 lines received.

The problem when comparing different slices of bytes is that:

let text = "não";
let slice = &text[2..];

Throws a runtime panic, because of the multi-byte char 'ã'.

Should I just work this around by using .as_bytes() instead of string slices?


About -s previous errors, the output wasn't showing the full line that first matched with the "pattern". I'll fix tests soon.

GrayJack commented 3 years ago
  1. Nowadays almost all editors supports windows like line-ending, so I think if a file has it, and it is followed by a "\n" it should be have the same behavior as "\n"

  2. There is something specifying that in the -s should consider only ASCII in the specification? If there isn't I think it should handle UTF-8 gracefully, otherwise, it would be a cool extension a new flag that handles UTF-8

I didn't had the time to look a the code yet, but I'll check as soon as possible

marcospb19 commented 3 years ago

@GrayJack, I had some time to work in fixing -s and implementing -f similar the GNU implementation of uniq.

This means that "\n" is different than "\r\n", and -s|skip-chars=N skips bytes instead (a char from the C language).

I'm sorry I don't have the time to change this :/ so I suggest that you could create an issue for each one if you want to.

GrayJack commented 3 years ago

bors r+

bors[bot] commented 3 years ago

Build succeeded: