curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.1k stars 99 forks source link

output errors do not lead to a non-zero exit code #305

Closed yahesh closed 2 months ago

yahesh commented 3 months ago

When providing trurl with a URL-encoded component that it is unhappy with (e.g. using URL-encoded characters smaller than %20) then the tool will not decode the component as requested but output a "trurl note", swallowing all output and only returning a single line break while the exit code is still 0. This is even the case with the --verify option set.

This could severely break pipelines that use trurl on untrusted URLs. Currently, tools using trurl cannot distinguish between successful parsing and URL-decoding errors based on the exit codes.

# this is okay
% trurl "?" --get "{host}" --verify ; echo "Exitcode: $?"
trurl error: No host part in the URL [?]
trurl error: Try trurl -h for help
Exitcode: 9
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl ".#fragment%00" --get "{fragment}" --verify ; echo "Exitcode: $?" 
trurl note: URL decode error, most likely because of rubbish in the input (fragment)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "./path%00" --get "{path}" --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (path)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl ".?query%00" --get "{query}" --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (query)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "user%00:password%00@." --get "{user}" --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (user)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "user%00:password%00@." --get "{password}" --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (password)

Exitcode: 0
bagder commented 3 months ago

It returns 0 when it can parse the URL, which it can in your examples. You might want --no-guess-scheme if you want it to not guess the scheme, as that makes it rather lenient.

yahesh commented 3 months ago

It returns 0 when it can parse the URL, which it can in your examples. You might want --no-guess-scheme if you want it to not guess the scheme, as that makes it rather lenient.

I'm not sure we're talking about the same thing here, because even with --no-guess-scheme set the tool will swallow the output due to the decoding error but still provide an exit code of 0.

# this is okay
% trurl "https://?" --get "{host}" --no-guess-scheme --verify ; echo "Exitcode: $?"
trurl error: No host part in the URL [https://?]
trurl error: Try trurl -h for help
Exitcode: 9
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "https://.#fragment%00" --get "{fragment}" --no-guess-scheme --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (fragment)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "https://./path%00" --get "{path}" --no-guess-scheme --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (path)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "https://.?query%00" --get "{query}" --no-guess-scheme --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (query)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "https://user%00:password%00@." --get "{user}" --no-guess-scheme --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (user)

Exitcode: 0
# THIS SHOULD RETURN A NON-ZERO EXIT CODE
% trurl "https://user%00:password%00@." --get "{password}" --no-guess-scheme --verify ; echo "Exitcode: $?"
trurl note: URL decode error, most likely because of rubbish in the input (password)

Exitcode: 0
bagder commented 3 months ago

Yes, because it returns 0 when it can parse the URL and it only adds a "note" about the problem to output what was asked.

We should probably consider making the output problems return non-zero as well.

bagder commented 3 months ago

It struck me we don't return error for those because you might pass in many URLs, and if you get a problem with --get on just one of them how is trurl supposed to behave then? Maybe we need another option... ?

For example , this is a legit command line:


$ trurl curl.se curl.se/?one%00 curl.se/?two -g{query}
yahesh commented 3 months ago

@bagder Curl has support for a --fail option. --fail in trurl could be described as being more strict than --verify.

bagder commented 3 months ago

It's not about "more strict" though. --verify is for being strict about (verifying) the input, this discussion is about using parts of the otherwise fine input when creating output. Basically problems with --get.

The new option would probably be documented as "return error if failing to produce output" or something. I think --fail is an odd name for that. How about --strict-get or --werror (the latter being "treat warnings as errors" as that's the name for the corresponding gcc option). Or something else.

yahesh commented 3 months ago

At the end, the decision is up to you, but I don't understand why you say that --fail is an odd name for the option when the word is even part of how you would document it:

"return error if failing to produce output"

bagder commented 3 months ago

I created #310, which instead adds a prefix for the get component to ask for "strict" or regular.

bagder commented 3 months ago

@yahesh that would make your command lines above return errors in the way you wanted them to, right?