crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.35k stars 1.62k forks source link

`Process.run(command, shell: true)` inconsistent on Windows and POSIX #12873

Open straight-shoota opened 1 year ago

straight-shoota commented 1 year ago

Process.run with shell: true behaves differently on Windows and POSIX platforms when the command executable does not exist:

Process.run("filedoesnotexist", shell: true)

Other error conditions might show similar inconsistent behaviour.

IMO raising FileNotFoundError is a more reasonable choice. The user asks to run a command and when that command doesn't even exist, this should usually be considered an error and raise. However, this might need to be considered a breaking change.

Btw. in Ruby `filedoesnotexist` raises the exception Errno::ENOENT on both Linux and Windows.

straight-shoota commented 1 year ago

Maybe not directly relevant, but there's also this related discussion about shell: true with args on Windows: #9030

j8r commented 1 year ago

FileNotFoundError will be incorrect on Unix-like systems, because the command can also be a built-in shell command. crystal eval 'Process.run("set", shell: true)' will succeed for instance, but set is not a file.

straight-shoota commented 1 year ago

I don't think that's exactly incorrect. Yes, a command could also be a shell built-in or alias. But when invoking that command fails with error 127 it means there is neither a built-in, alias nor file by that name. So file not found would be acceptable IMO. It could also use a more specific error type, though.

j8r commented 1 year ago

Other possible valid one word "commands" I found:

straight-shoota commented 1 year ago

What's the significance of "one word commands? Does that have any special meaning?

It shouldn't matter what the command is (it can even be a complex shell expression). If a command is not found, the shell returns exit code 127. That implicitly means no file was found that would match the command name. If there would be a more specific error type (such as CommandNotFound error), it should probably be a subtype of FileNotFound. I doubt that such specificity is necessary, though. Note that Process.run with shell: false also raises FileNotFound if the command cannot be found.

straight-shoota commented 1 year ago

Another difference is that on POSIX the shell besides returning exit code 127 also prints a message to stderr (something like --: 1: filedoesnotexist: not found). On Windows there's no output. I suppose this could be acceptable as functional differences of the operating system / shell being used for executing the command. It might still be worth considering to unify this aspect as well.

j8r commented 1 year ago

I mean, there isn't necessary a file involved in the process for shell expressions. I wasn't sure if there was any parsing done, with spaces for instance, so in my examples I didn't include any - but that's not the case.

straight-shoota commented 1 year ago

Well, yes when running a command that is actually a shell built-in, no file will be looked up and it doesn't matter if it exists or not. But then it's not in the error track because the command exists. When it's not a built-in, the shell looks up a file by the name of the command. A file is involved in that scenario. Or rather its absence. And that's a perfect fit for FileNotFound.

j8r commented 1 year ago

Faire enough. Another alternative is to still return a Process::Status with an #exit_reason getter, which will be FileNotFound (can be an enum member).

The advantage being not a breaking change.

j8r commented 1 year ago

I think I may have confused myself with the behavior of shell: false. It can be good to have in this case the same result between the two.

straight-shoota commented 1 year ago

Another alternative is to still return a Process::Status with an #exit_reason getter, which will be FileNotFound

Yeah that would align with #12284

But I think communicating this as Process::Status is not a good model.

When the command to be executed is not found, the shell doesn't even attempt to run any process because there is nothing to run. So this should be expressed as an error state of Process.run, i.e. raise an exception. Process.run("commandnotexists", shell: true) returning a Process::Status with exit code 127 exposes an implementation detail on POSIX. shell: true is implemented via /bin/sh -c which communicates that the command was not found via exit code 127. The public API has nothing to do with /bin/sh, so this specific error communication should not leak outside of Process.run.

straight-shoota commented 1 year ago

Due to the implementation of shell: true and the mechanics of spawning a process on POSIX systems, there is some inherently different behaviour. On Windows, Process.new(shell: true) directly spawns a process with the given command and raises if the command cannot be found. On POSIX, Process.new(shell: true) only spawns the shell (/bin/sh) which should usually succeed. The shell then executes the command and we can't know whether that was successful until the shell process exits, which requires to wait for it.

So a command not found error can be observed in different places: On windows it's directly raised in Process.new(shell: true). On POSIX, it's later in Process#wait. At that point the original command and whether it's run with a shell are unknown, because Process is a very thin wrapper and doesn't persist that information. That impacts the amount of information that can be available for an error message.

Process.new(shell: false) raises immediately on all platforms.

Process.run calls .new and #wait internally and can handle any kind of error. But when using the Process.new API with shell: true, errors about the command not being found will be reported in different places (.new vs #wait).

straight-shoota commented 1 year ago

There's an additional complication with mapping exit codes 127 and 126 to File::NotFoundError and File::AccessDeniedError on POSIX: Those exit codes don't necessarily need to come from the shell process telling that it couldn't run the command it received. It could also be passed through from the command that's run within the shell. Still telling that a command couldn't be run, but then it's inside the user-supplied command and not from the transparent wrapper around it. For a completely transparent implementation of shell: true we would need to differentiate where the exit status originates. If it's from the shell, raise an exception. Otherwise it's just the exit status of the command that's being executed in the shell. But that would be difficult to implement. I suppose it should probably be okay to just raise on 127 and 126 exit codes.

mominshaikhdevs commented 1 year ago

FileNotFoundError will be incorrect on Unix-like systems, because the command can also be a built-in shell command.

"aliases" and "built in shell commands" are not unix-clones only thing.

straight-shoota commented 1 year ago

For reference I have an experimental branch for dabbling with this: https://github.com/straight-shoota/crystal/pull/new/fix/process-run-raise-file_not_found

HertzDevil commented 1 year ago

I find it odd that POSIX calls /bin/sh -c but Windows doesn't call %COMSPEC% /C. At least for the backtick (where #9030 does not apply because there are no extra arguments) it should be possible to use built-in shell commands:

`exit 123`   # raises on Windows
$?.exit_code # 123 on POSIX, unreachable on Windows

Now if we do this explicitly, we could obtain the equivalent conventional exit code used to represent a non-existent file or command, and also an error will be printed to the console:

`cmd.exe /v /c filedoesnotexist & exit !ERRORLEVEL!`
# 'filedoesnotexist' is not recognized as an internal or external command,
# operable program or batch file.
$?.exit_code # => 9009

Process.run("filedoesnotexist", shell: true) should probably expand to something like that. The /v and exit !ERRORLEVEL! are necessary to return the actual exit code rather than letting cmd.exe overwrite it, although I imagine it makes escaping certain commands much harder or even impossible.

Also, we are dealing with the shell's syntax here, which is beyond the scope of Process.quote_windows; interpolating the result of Process.quote_windows into the argument to cmd.exe is still unsafe.

I'm not sure if turning `exit 127` into a file-not-found exception is a good idea, because the exit built-in command does indeed exist. And if we are dealing with arbitrary shell commands we cannot reliably identify the file that was meant to be found.

SamantazFox commented 1 year ago

There's an additional complication with mapping exit codes 127 and 126 to File::NotFoundError and File::AccessDeniedError on POSIX: Those exit codes don't necessarily need to come from the shell process telling that it couldn't run the command it received. It could also be passed through from the command that's run within the shell.

That's not supposed to happen. 126 and 127 are reserved return codes under POSIX:

Exit Code Meaning Example Comments
126 Command invoked cannot execute /dev/null Permission problem or command is not an executable
127 "command not found" illegal_command Possible problem with $PATH or a typo

Source: https://tldp.org/LDP/abs/html/exitcodes.html (I'd like to link to the POSIX specification, but it's paywalled)

straight-shoota commented 1 year ago

@SamantazFox Yes, the meaning of these exit codes is clearly defined. That shouldn't be an issue. But the reference can be ambiguous.

On windows, there's a clear indicator whether the specified shell command could be executed or not. Process.run("foobar", shell: true) raises when it fails to execute foobar. On POSIX, the exit code 126 or 127 can mean that foobar failed to execute. But it could also mean that foobar itself executed correctly, but it failed executing another command.

Compare the result of these largely equivalent command invokations on Windows and POSIX:

# Windows
Process.run("cmd /c NUL", shell: true).exit_code # => 1
Process.run("NUL", shell: true).exit_code # File::NotFoundError

# POSIX
Process.run("bash -c /dev/null", shell: true).exit_code # => 126
Process.run("/dev/null", shell: true).exit_code # => 126
SamantazFox commented 1 year ago

Ah, hmm, I see. I'd expect exit code 126/127 to raise on POSIX systems anyway. I spent an hour yesterday trying to figure out why Process.run wasn't raising, where the documentation mentions it should.

straight-shoota commented 1 year ago

Yes, the documentation is incorrect when using shell: true on a POSIX system.