URLencoded characters in query and path properties should not be decoded because it modifies the real URL :
decoding percent-encoded %xx in query it makes it impossible to properly decode query parameters, eg: %3d becoming = (equal)
decoding for instance %2f to / (slash) in path breaks the real path of the URL
Ones could argue decoding of the path is more suject to discussion... However, as a rule it should be possible to recreate the original URL by reassembling the parts resulting of the processor. Also for consistency, it is bad idea to have different reasoning for defferent parts or the URL. Currently the processor is mixing parsing and URLdecod'ing whereas its goal is to parse (extract parts). Decoding percent-encoded sequences in query may return illegal characters, hence making real (as in proper) decoding of query parameters impossible.
Input URL : http://www.acme.com/some/thing?a=123&b=x%26c%3dy
Should give query : a=123&b=x%26c%3dy (which decode in a 123, b x&c=y)
But currently gives query : a=123&b=x&c=y (which would decode in a 123, b x, c y)
Sample 2
Input URL : http://www.acme.com/some%2fthing
Should give path : /some%2fthing
Currently gives path : /some/thing
Elasticsearch Version
8.15.0
Installed Plugins
No response
Java Version
bundled
OS Version
RockyLinux 9
Problem Description
URLencoded characters in query and path properties should not be decoded because it modifies the real URL :
Ones could argue decoding of the path is more suject to discussion... However, as a rule it should be possible to recreate the original URL by reassembling the parts resulting of the processor. Also for consistency, it is bad idea to have different reasoning for defferent parts or the URL. Currently the processor is mixing parsing and URLdecod'ing whereas its goal is to parse (extract parts). Decoding percent-encoded sequences in query may return illegal characters, hence making real (as in proper) decoding of query parameters impossible.
I believe getRaw functions (https://docs.oracle.com/javase/8/docs/api/java/net/URI.html) should be used in main/java/org/elasticsearch/ingest/common/UriPartsProcessor.java.
Sample 1
Input URL :
http://www.acme.com/some/thing?a=123&b=x%26c%3dy
Should give query :a=123&b=x%26c%3dy
(which decode in a123
, bx&c=y
) But currently gives query :a=123&b=x&c=y
(which would decode in a123
, bx
, cy
)Sample 2
Input URL :
http://www.acme.com/some%2fthing
Should give path :/some%2fthing
Currently gives path :/some/thing
Steps to Reproduce
With inputs :
Logs (if relevant)
No response