Closed PragTob closed 1 week ago
Here is another example of this https://github.com/wojtekmach/req/issues/270#issue-1993428506 (cc @thbar):
iex> {:ok, conn} = Mint.HTTP.connect(:https, "api.oisemob.cityway.fr", 443)
iex> url = "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
iex> {:error, conn, e} = Mint.HTTP.request(conn, "GET", url, [], "")
iex> e
%Mint.HTTPError{
reason: {:invalid_request_target,
"/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"},
module: Mint.HTTP1
}
@PragTob if you URI-encode offending characters in your requests, would they still work? It works in the example above:
iex> {:ok, conn} = Mint.HTTP.connect(:https, "api.oisemob.cityway.fr", 443)
iex> url = "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
iex> url = String.replace(url, "|", URI.encode("|"))
iex> {:ok, conn, ref} = Mint.HTTP.request(conn, "GET", url, [], "")
Instead of making Mint more relaxed, while I'm not super looking forward to doing that, I think I can automatically URI-encode a given URL in Req. (I already do when using params: ...
.)
WDYT?
@wojtekmach thanks for the comment :pray:
The issue I had with URL encoding the query string was that I think it'll usually end up encoding things twice.
I.e. if I exchange my sanitize query code with just URI.encode
I get test failures like this:
1) test sanitize/1 if it's already encoded we don't re-encode it (SanitizeQueryStringTest)
test/sanitize_query_string_test.exs:41
Assertion with == failed
code: assert "key=val%3Dval" == sanitize("key=val%3Dval")
left: "key=val%3Dval"
right: "key=val%253Dval"
stacktrace:
test/sanitize_query_string_test.exs:42: (test)
And yeah I don't want to double encode values I think. Or maybe I'm not getting your approach :thinking:
A solution in Req
would definitely work for me, I just wanted to start at mint
as that's where the thrown error originates :)
To illustrate, I think it may be helpful to share the code & tests we're currently using for the escaping:
Fair enough about not wanting to double-encode. Are you able to use Req params feature by any chance?
iex> r = Req.new(url: "/", params: [key: "val=val"]) ; Req.Request.prepare(r).url |> to_string()
"/?key=val%3Dval"
@wojtekmach I don't think we are. Our use cases/issue are two:
One of our first attempts to fix it was to have Plug
parse the params (we don't usually) but that also blew up and so I removed it, attempting to work on the String base to get the bad values out as early as possible :sweat_smile: - Thinking about it, this may be worth an issue in Plug
as well, but working around that was "easi-er" for us :)
What would be the issue in Plug, that it shouldn't blow up i.e. have a relaxed parsing?
OK, I could see an argument to skip validation, similar to recent :case_sensitive_headers
option. Garbage in garbage out huh? But yeah, I'm curious if there are other solutions. Perhaps smarter sanitation in your code (or Req) that does not double-encode? Though that seems pretty fragile.
I forget what the exception was but yeah parsing failed. I'd need to mull it over.
I sadly agree with the "garbage in, garbage out" - exactly the same as :case_sensitive_headers
- I've run into that particular issue more than once :cry: (not with Req/Mint).
Like, I like to be standard conformant and compliant so ideologically I support it - but the world out there isn't standard conformant/compliant and that's the world in which our applications have to live. The clinching point for me was Chrome, curl and wget all supporting these URLs without complaint. At that point it feels like a "lived" standard to me.
As for sanitation, the code I posted above does exactly that and I'm relatively confident in the tests. I'm not 100% sure I used the correct reserved character function above (but I think Mint
kept blowing up when I used the other one). So, I think it's possible (before that we were playing whack-a-mole with hard-coded patterns).
That said, that might work but I think I'd prefer for that to be a "non-issue" given an option in Mint
. Unless there is some other problem with that that I don't see/understand. As long as it's opt-in in like "be careful, you may shoot yourself in the foot with this" I think that's perfectly reasonable (but I also don't have the context of a maintainer her).
Even though the lack of bunny pictures is disappointing @PragTob, yeah I think this can make sense. The option can be on by default, but allowed to be false
to skip validation. Any chance you would want to work on this? 🙃
@whatyouhide 😂😂😂😂
I was contemplating it but thought maybe people get annoyed when I waste their premium screen space with bunny pictures all the time. I shall make up for my past transgressions!
@whatyouhide now that I tried to make up for my past transgressions and we have that out of the way, very happy to take a stab at implementing it!
fantastic, thanks for the pictures and the help @PragTob!
@whatyouhide thanks for a fantastic library, least I can do! :green_heart:
:wave:
Hi there everyone and thank you so much for mint - been a joy to use via
req
and helps us make a significant amount of HTTP requests! :green_heart:Problem
In our context we're frequently conftonted with query strings that aren't strictly standard conform. These contain "macros". Those look something like this:
referrer={encSite}
and can occur multiple times per URL.There are many different patterns of these that I know of, usually occurring in the value part of the query params, some more patterns I know of:
key=[%macro%]
key=${macro}
key=%%macro%%
"key=%ebuy!"
This blows up in
Mint.HTTP1.Request.validate_target!/1
( and/2
) https://github.com/elixir-mint/mint/blob/50b11d668b6a240b0d9b20c67fbb41a10a7410b1/lib/mint/http1/request.ex#L49-L65While I understand enforcing standards, sometimes the real (old) applications out there makes that hard.
Small example
``` iex(1)> HttpService.get!("https://some.url.com/path?rb=51439&ca=20863020&ra=1730132562831&_o=51439&_t=20863020") # in this example, a redirect happens ** (Req.HTTPError) invalid request target: "/path?c=bd8618c307ae9885a12561b7191e2cea&cid=5134455426882927793&referrer={encSite}&some=more" (req 0.5.6) lib/req.ex:1092: Req.request!/2 iex:1: (file) ```Solution
I'd be really happy about being able to provide an option where validating the target is skipped or made more lenient. Or some way in which I could easily accomplish that. I'm not sure of the consequences of this but yeah :)
(and then get to it through Req and Finch)
I'm happy to implement code to do this, once aligned on a solution.
I also have a bunch of affected URLs in tests (running against bypass) or as examples lying around. So, I can definitely test solutions.
Workaround
The current workaround we use is that we implemented code to sanitize the query string (working similar to
validate_target!/1
, it's fun as the URLs we get are partially encoded and partially not so running encode on all of it is not what we want).The topic gets a bit harder though, as sometimes redirects will also include these broken URLs. Which is also what you see in the example I posted above. As a workaround for this we're implementing our own redirect following now (so that we can sanitize the query strings on every hop). This is the part where I decided to open this issue, as I'd really like to get rid of this at least :sweat_smile: (but at best all the custom code I wrote for this).
Behavior of other HTTP clients
For funzies I tried the URL in multiple HTTP clients, and it worked in all of them: Firefox, Chrome, curl, wget
So while I don't think it's standard conform, it seems to be common practice among some of the most popular/widely used "HTTP" clients. So, I think at least having the option to process it would be nice.
Again, thanks a lot for all your work in providing mint and friends. It's much appreciated! :green_heart: