curl / trurl

trurl is a command line tool for URL parsing and manipulation.
https://curl.se/trurl/
Other
3.1k stars 99 forks source link

`--get "{query:$NAME}"` only returns the first match #302

Closed yahesh closed 3 months ago

yahesh commented 3 months ago

I learned about trurl today and gave it go. You can get the value of a query parameter through --get "{query:$NAME}". It's also possible to add multiple query parameters with the same name through --append "query=$NAME=$VALUE1" --append "query=$NAME=$VALUE2". However, --get "{query:$NAME}" will only retrieve the first value.

This might lead to lost information. Some implementations (like PHP) support multiple values for a query parameter (like $NAME[]=$VALUE1&$NAME[]=$VALUE2). Some implementations (like PHP if the query name does not end in []) use the last occurrence of a query parameter instead of the first.

It would be great to have a way to get all values of a specific query parameter name returned.

bagder commented 3 months ago

Agreed. Should it require a different option? How should it provide them in the output?

yahesh commented 3 months ago

Given that parsing the output of --get "{query:$NAME} is risky already due to the URL-decoding, one could print the values of the different values in a tab-separated list so that handling a list of URLs is still supported. The support for outputting all query parameter values should probably be put behind a command option like --all-values.

So this could look like this:

% trurl "https://localhost?name=value1&name=value2" --get "{query:name}" --all-values
value1  value2

% trurl "https://localhost?name=value1&name=value2" "https://localhost?name=value3&name=value4" --get "{query:name}" --all-values
value1  value2
value3  value4

There are risks attached to this if the value contains characters like line breaks and tabs due them getting URL-decoded by default:

% trurl "https://localhost?name=value1&name=value2%0Avalue3%09value4" --get "{query:name}" --all-values
value1  value2
value3  value4

This can be circumvented by keeping the URL-encoding in place:

% trurl "https://localhost?name=value1&name=value2%0Avalue3%09value4" --get "{:query:name}" --all-values
value1  value2%0Avalue3%09value4
bagder commented 3 months ago

Given that parsing the output of --get "{query:$NAME} is risky already due to the URL-decoding

  1. Why is that risky? It is only risky if you do funny assumptions about what you get.
  2. It already supports getting the data URL encoded with --get {:query}.
bagder commented 3 months ago

But yeah, when the output can be "anything", making separators becomes difficult.

bagder commented 3 months ago

I suppose one way could be to allow trurl to mention how many instances there are, as then a user could cherry-pick the exact instance. For example: (brain-storming mode engaged)

$ trurl "example.com?a=firsrt&a=second&a=third" --get {query:a:amount}`
3
$ trurl "example.com?a=firsrt&a=second&a=third" --get {query:a:2}`
second
yahesh commented 3 months ago

From a shell-user perspective I'd personally prefer a GREPable output format. The counting and index features would require loops while something like a tab-separated list does not. In the case of a GREPable output format one could just string the trurl call together with an xargs call, etc.

bagder commented 3 months ago

How would a tab in the content be separated from the tab as a separator?

bagder commented 3 months ago

Ah, we do have:


trurl "https://localhost?name1=value&name1=again" --get '{query-all:name1}'
bagder commented 3 months ago

(Which has the separator problem but with space..)

yahesh commented 3 months ago

How would a tab in the content be separated from the tab as a separator?

It wouldn't in a secure way just like it currently isn't securely possible in this case:

% trurl "localhost?name=value1" "localhost?name=value2%0Avalue3" --get "{query:name}"
value1
value2
value3

But with keeping the URL-encoding in place, this is not an issue:

% trurl "localhost?name=value1" "localhost?name=value2%0Avalue3" --get "{:query:name}"
value1
value2%0Avalue3

Ah, we do have:

trurl "https://localhost?name1=value&name1=again" --get '{query-all:name1}'

Ah, I must have overlooked that in the manpage.

(Which has the separator problem but with space..)

Which is equivalent to the thing mentioned above. Keeping the URL-encoding in place during the parsing stage solves the separator problem:

% trurl "localhost?name=value1" "localhost?name=value2%0Avalue3&name=value4" --get "{:query-all:name}"
value1
value2%0Avalue3 value4
bagder commented 3 months ago

Maybe we should offer the user a way to specify the separator as a custom string, as then a user can increase the chances of it working...

yahesh commented 3 months ago

As {query-all:$NAME} exists but is a bit hard to find, it would be fine for me if query-all would be mentioned in the output of --help.

yahesh commented 3 months ago

Maybe we should offer the user a way to specify the separator as a custom string, as then a user can increase the chances of it working...

I personally don't think that will help a lot with handling query parameters in a secure way. Users already have the option to replace the separator with something of their liking via tr. Inline-separators will never be 100% secure when users don't take special characters from the URL-decoding process into account.

A safe way for users to handle separators e.g. looks like this:

% trurl "localhost?name=value1%20value2&name=value3" --get "{:query-all:name}" | xargs -n1
value1%20value2
value3

That's even the case with --accept-space enabled:

% trurl "localhost?name=value1 value2&name=value3" --accept-space --get "{:query-all:name}" | xargs -n1
value1+value2
value3
bagder commented 3 months ago

I personally don't think that will help a lot with handling query parameters in a secure way

If you set the separator to be for example 16 random letters, the risk they will be in the content is next to zero. But of course then the user needs to split the content into separate parts using the same boundary. Which admittedly is inconvenient.

It could be made to work like this:

$ trurl --boundary OOO example.com/?name=a&name=b&name=c --get "{query-all:name}"
OOOaOOObOOOc

Keeping the content URL encoded is another way and one that already is there, but then the user still needs to decode the data in a second inconvenient step.

bagder commented 3 months ago

I think that since this feature is actually already supported we can just call this solved. We can of course most certainly still improve the option to become more "intuitive" and easy to find in -h output and manpage etc.