Closed mardukbp closed 2 years ago
Hey,
Thanks for the report. I'm not able to reproduce this since I'm on Linux, but here's what I've tried:
$ printf 'a\r\n' | muter -c hex
610d0a
$ printf 'a\r\n' | muter -c hex | muter -c -hex
My guess as to what's happening here is that Windows somehow always appends a CRLF to standard output (or maybe standard input) if the format doesn't provide one. If so, that's unfortunate, but we internally have a strict and non-strict mode, and so I think we can just swallow the CRLF bytes in this case in the non-strict mode.
The Rust standard library documentation does say that it only handles UTF-8 byte sequences, which will cause problems in a bunch of cases if your data doesn't support that, but I think we can just document that as a limitation on Windows.
Anyway, I'll try to get a patch out relatively soon with a fix for the hex
codec and similar codecs. There's likely a couple that will need fixing.
On Windows echo
always appends a new line to its output. There is a way to prevent it, but that is not actually the problem. In Windows lines (in files and in the shell) end with CRLF, which are two UTF-8 byte sequences. Therefore, proper handling of textual data on Windows requires taking this into account. There is no fundamental limitation or conflict with Rust. Windows is just different than macOS and Linux. I use all three of them. That is why I am interested in muter functioning properly everywhere :)
echo
appends a newline to its input on Linux as well. The problem you're seeing isn't echo
. If it were echo
, then we'd see an error from echo a | muter -c hex
, which we don't. What we see here is that muter -c -hex
gets a CRLF sequence, which muter -c hex
doesn't emit. In fact, on my Linux system, I use zsh, and I get the following:
printf 'a\r\n' | muter -c hex
610d0a%
That final %
is actually printed inverted and it's printed by zsh because there's no newline at the end of the line. muter doesn't print any line endings by default (since sometimes line endings matter or we don't print text output).
So what I need to look into, which I will once I get a temporary Windows VM set up, is why we get a needless CRLF here, which we're not supposed to. I have some ideas about how to handle that, but I have to see what Windows does in this case to do some testing.
What shell are you using in this case? Is it CMD, PowerShell, Git Bash, or something else?
Yes, you are right. On Linux echo
appends \n to its output, unless you pass the -n flag. Like I said, on Windows echo
appends \r\n
to its output (in both CMD and PowerShell). So yes, it is supposed to be there. Likewise, encoding and decoding a text file on Windows fails for the same reason. On Windows lines are separated by \r\n (CRLF). Just try creating the text file muter.txt containing two lines in the Notepad and running muter -c hex muter.txt | muter -c -hex
. You will get the same error.
muter -c -hex
decodes only hex characters. It isn't designed to accept anything that is not a hex character, and the fact that it accepts a trailing newline isn't intended; in other words, it's a bug that that happens to work. I literally just discovered this fact a few minutes ago.
That's because by default muter operates in strict mode, and it's supposed to reject anything that isn't a valid character in the stream. There is a little bit of support for non-strict mode in the code, but I haven't gotten there fully yet. It's tricky because if someone inserts a very large amount of invalid characters into the stream, with the current design we might end up never making progress.
There are other codes that do accept newlines or CRLF as part of the stream, like uri
, since some characters may be encoded, and others may not. Therefore, in some cases, someone could intentionally insert an LF or CRLF into the stream and want it to be an LF or CRLF However, for a hex-encoded stream, an LF or CRLF is never part of a hex-encoded stream, so they're not supposed to be allowed.
If you want to strip off trailing newlines or CRLF in the mean time, then you can do this:
$ echo a | muter -c hex | muter -c -wrap:-hex
$ echo a | muter -c hex | muter -c -crlf:-wrap:-hex
What I've found here is that this is intrinsically related to the fact that the process is being run in a PowerShell or CMD window. When I run muter in one of those shells, the pipe always contains a CRLF at the end, even though muter doesn't output one. That's not the expected behavior, and that's why this is happening. This problem doesn't occur in a Git Bash window, and so things work there.
It looks like this is a known issue with PowerShell. That's unfortunate, because muter is designed to work on streams of bytes and those bytes specifically don't have to be text at all.
I'll try to work on getting this to work a little better, but it may take me a bit of time to get this sorted finally. I do want to point out that Windows isn't a supported platform for my projects and it isn't tested there, although I'll see what I can do to make it work as well as possible.
Sorry it's taken me so long to get back to this. I have a branch at https://github.com/bk2204/muter/tree/crlf-improvements which should help improve some of this with the --no-strict
flag. There's additional documentation in the manual page as well, outlining the example I gave how to make this work with the existing version.
This should be fixed with d7f9152c9b7f2f21b84bed13ac60f9c70f17688f.
Thanks a lot for fixing this issue! I just tested it.
echo a | muter -c hex | muter -c -wrap:-hex
works as expected, but
echo a | muter -c hex | muter -c -crlf:-wrap:-hex
prints the usage instructions.
What specific output do you get when running echo a | muter -c hex | muter -c -crlf:-wrap:-hex
? Can you copy and paste the output?
PS> echo a | muter -c hex | muter -c -crlf:-wrap:-hex
muter
Encodes and decodes byte sequences
USAGE:
muter.exe [FLAGS] [OPTIONS] --chain <CHAIN> [INPUT]...
FLAGS:
-h, --help Prints help information
-r, --reverse Reverse transforms in both order and direction
-V, --version Prints version information
OPTIONS:
--buffer-size <buffer-size> Size of buffer
-c, --chain <CHAIN> List of transforms to perform
ARGS:
<INPUT>... Input files to process
Modify the bytes in the concatentation of INPUT (or standard input) by using the
specification in CHAIN.
CHAIN is a colon-separated list of encoding transform. A transform can be
prefixed with - to reverse it (if possible). A transform can be followed by one
or more comma-separated parenthesized arguments as well. Instead of
parentheses, a single comma may be used.
For example, '-hex:hash(sha256):base64' (or '-hex:hash,sha256:base64') decodes a
hex-encoded string, hashes it with SHA-256, and converts the result to base64.
If --reverse is specified, reverse the order of transforms in order and in sense.
The following transforms are available:
ascii85
bare : do not use delimiters
base16
lower : use lowercase letters
upper : use uppercase letters
base32
nopad : do not pad incomplete sequences with =
pad : pad incomplete sequences with =
base32hex
nopad : do not pad incomplete sequences with =
pad : pad incomplete sequences with =
base64
nopad : do not pad incomplete sequences with =
pad : pad incomplete sequences with =
bubblebabble
checksum
adler32 : use Adler32 as the checksum
fletcher16: use Fletcher16 as the checksum
crlf
deflate
form
lower : use lowercase letters
upper : use uppercase letters
gzip
hash
blake2b : use BLAKE2b as the hash
blake2s : use BLAKE2s as the hash
length : specify the digest length in bytes for BLAKE2b, BLAKE2s, and BLAKE3
md5 : use MD5 as the hash
sha1 : use SHA-1 as the hash
sha224 : use SHA-224 as the hash
sha256 : use SHA-256 as the hash
sha3-224 : use SHA3-224 as the hash
sha3-256 : use SHA3-256 as the hash
sha3-384 : use SHA3-384 as the hash
sha3-512 : use SHA3-512 as the hash
sha384 : use SHA-384 as the hash
sha512 : use SHA-512 as the hash
hex
lower : use lowercase letters
upper : use uppercase letters
identity
lf
empty : print nothing if the input is empty
modhex
quotedprintable
length : wrap at specified line length (default 76; 0 disables)
swab
length : handle chunks of this size
uri
lower : use lowercase letters
upper : use uppercase letters
url64
nopad : do not pad incomplete sequences with =
pad : pad incomplete sequences with =
uuencode
vis
cstyle : encode using C-like escape sequences
glob : encode characters recognized by glob(3) and hash mark
nl : encode newline
octal : encode using octal escape sequences
sp : encode space
space : encode space
tab : encode tab
white : encode space, tab, and newline
wrap
length : wrap at specified line length (default 80)
xml
default : use XML entity names
hex : use hexadecimal entity names for XML entities
html : use HTML-friendly entity names for XML entities
zlib
Can you verify what version you're running? 0.7.0 doesn't even build for me on Windows.
Also, due to what I've found out about PowerShell's pipes and how they handle binary data, I think I'm going to declare PowerShell explicitly unsupported as an environment for this project. I don't think there's any way it can reasonably work in that environment and since trying to reproduce a problem on Windows takes about an hour of setup for me with a time-limited VM, I don't believe it's a good use of my time to try to paper over its shortcomings.
gettext-rs does not compile on Windows due to the usage of a tar flag that only GNU tar has. The GitHub issue is still open. Therefore, in order to compile muter 0.7.0 I replaced gettext-rs with gettext (pure Rust implementation) and commented out the init call (gettext must be initialized using a different method, which I didn't care to use).
FYI Powershell is cross-platform. I just installed it on Fedora 35 and obtained the same results as on Windows.
In Windows I use lots of Rust programs precisely because they work in every platform. So I can use the same tools also on macOS and Linux. So I know for sure that it is possible to write CLI programs that work on Powershell. Of course you are free to give up on it. Thank you anyway for all the time you have invested in this issue.
I understand that it's possible to write software that's cross platform. However, as a someone who uses primarily Linux, my programs are focused around Unix systems, since they're easiest for me to support and be knowledgeable about. Windows and PowerShell are very different and since I don't use them or personally care for them very much (and when I do use Windows, it's always with WSL), it's hard for me to be knowledgeable or test them. I can probably try to follow up on things in WSL, however, since the additional burden of supporting it would be minimal.
It may be that PowerShell is available for Fedora, but I use Debian, and it has not been packaged there because it contains non-free components. As such, it remains out of the possibility that I'd be able to test with it in any meaningful way on a periodic basis.
You are welcome to submit patches if you'd like to improve the experience there, and I will consider them as appropriate, but I'm unable to support PowerShell in any meaningful way. You are also free not to use Muter, of course, if you'd prefer, or to live with its limitations. Sorry we couldn't make things work out better.
I downloaded the PowerShell RPM from GitHub. I will try to find time to see what is going on with muter and PowerShell and submit a PR. Thanks again for taking the time to address this issue and writing all these thoughtful responses.
On Windows
echo
adds a carriage return, which muter does not like:Thanks a lot for this awesome program!