elixir-mint / mint

Functional HTTP client for Elixir with support for HTTP/1 and HTTP/2 🌱
Apache License 2.0
1.36k stars 106 forks source link

Setting user-agent header leads to protocol error #349

Closed ppatrzyk closed 2 years ago

ppatrzyk commented 2 years ago

Hello,

It seems that when I set user-agent header to some specific values, requests return an error message about :protocol_error

I'm using mint 1.4.1 and a genserver module taken from your docs.

This works fine:

{:ok, pid} = ConnectionProcess.start_link({:https, "www.shopify.com", 443})
response = ConnectionProcess.request(pid, "GET", "/", [{"user-agent", "whatever"}], :nil)

But this throws an error:

response = ConnectionProcess.request(pid, "GET", "/", [{"user-agent", "curl/7.68.0"}], :nil)

%Mint.HTTPError{module: Mint.HTTP2, reason: {:protocol_error, "trailing headers didn't set the END_STREAM flag"}}}

Getting this url with curl works OK.

whatyouhide commented 2 years ago

Hey @ppatrzyk, can you paste the output of curl using the -v option?

ppatrzyk commented 2 years ago

Sure:

$ curl -v https://www.shopify.com
*   Trying 104.16.255.71:443...
* TCP_NODELAY set
* Connected to www.shopify.com (104.16.255.71) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=www.shopify.com
*  start date: Jul  5 00:00:00 2021 GMT
*  expire date: Jul  4 23:59:59 2022 GMT
*  subjectAltName: host "www.shopify.com" matched cert's "www.shopify.com"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x558cb2bf8e30)
> GET / HTTP/2
> Host: www.shopify.com
> user-agent: curl/7.68.0
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 103 
< link: <https://cdn.shopify.com/shopifycloud/brochure/assets/application-15065a20bcc439ed82721b3579f52386963fffcf5dc3a07b645272d0f9832fef.css>; as=style; rel=preload, <https://cdn.shopify.com/shopifycloud/brochure/assets/manifests/home/index-15283b6827a3bcadb31687c4316738bc2c978aa6775b7ccc94cf0deb8fc97081.css>; as=style; rel=preload
< HTTP/2 200 
< date: Thu, 24 Feb 2022 18:30:51 GMT
< content-type: text/html; charset=utf-8
< vary: Accept-Encoding
< accept-ch: Save-Data
< link: <https://cdn.shopify.com/shopifycloud/brochure/assets/application-15065a20bcc439ed82721b3579f52386963fffcf5dc3a07b645272d0f9832fef.css>; rel=preload; as=style,<https://cdn.shopify.com/shopifycloud/brochure/assets/manifests/home/index-15283b6827a3bcadb31687c4316738bc2c978aa6775b7ccc94cf0deb8fc97081.css>; rel=preload; as=style
< etag: W/"21b197a8bb7f8c06a35214e9640b96c0"
< cache-control: max-age=0, private, must-revalidate
< set-cookie: _shopify_y=5df4de32-eb46-45f5-a2ea-cc176be8ee0f; domain=.shopify.com; path=/; expires=Fri, 24 Feb 2023 18:30:50 GMT; SameSite=Lax; secure
< set-cookie: _shopify_s=411ec371-c343-4d60-8fe3-d1babcdd4eff; domain=.shopify.com; path=/; expires=Thu, 24 Feb 2022 19:00:50 GMT; SameSite=Lax; secure
< set-cookie: _y=5df4de32-eb46-45f5-a2ea-cc176be8ee0f; domain=.shopify.com; path=/; expires=Fri, 24 Feb 2023 18:30:50 GMT; SameSite=Lax; secure
< set-cookie: _s=411ec371-c343-4d60-8fe3-d1babcdd4eff; domain=.shopify.com; path=/; expires=Thu, 24 Feb 2022 19:00:50 GMT; SameSite=Lax; secure
< x-request-id: ec20b80e-99e9-466f-91be-337dd52b11dc
< x-runtime: 0.025768
< strict-transport-security: max-age=15552000; includeSubDomains; preload
< x-frame-options: deny
< x-content-type-options: nosniff
< x-xss-protection: 1; mode=block; report=/xss-report?source%5Baction%5D=index&source%5Bapp%5D=Brochure&source%5Bcontroller%5D=home&source%5Bdomain%5D=www.shopify.com&source%5Bsection%5D=brochure&source%5Buuid%5D=ec20b80e-99e9-466f-91be-337dd52b11dc
< x-download-options: noopen
< x-permitted-cross-domain-policies: none
< content-security-policy-report-only: default-src 'self' https:; child-src 'self' https: data:; connect-src 'self' https: wss:; font-src 'self' https: data:; img-src 'self' https: data:; media-src 'self' https: data:; object-src 'self' https:; script-src 'self' https: 'unsafe-inline' 'unsafe-eval'; style-src 'self' https: 'unsafe-inline'; report-uri /csp-report?source%5Baction%5D=index&source%5Bapp%5D=Brochure&source%5Bcontroller%5D=home&source%5Bdomain%5D=www.shopify.com&source%5Bsection%5D=brochure&source%5Buuid%5D=ec20b80e-99e9-466f-91be-337dd52b11dc
< server-timing: processing;dur=28, socket_queue;dur=2.342, util;dur=3.0
< x-dc: gcp-us-east1
< cf-cache-status: DYNAMIC
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< server: cloudflare
< cf-ray: 6e2ac477fe172074-AMS
< alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
< 

...body follows here
wojtekmach commented 2 years ago

Interesting, it looks like Shopify might be doing user-agent spoofing because this particular problem seems to only occur for curl:

defmodule Main do
  def main do
    {:ok, pid} = ConnectionProcess.start_link({:https, "www.shopify.com", 443})

    {:ok, %{status: 200}} =
      ConnectionProcess.request(pid, "GET", "/", [{"user-agent", "mint/7.68.0"}], nil)

    ConnectionProcess.request(pid, "GET", "/", [{"user-agent", "curl/7.68.0"}], nil)
  end
end

defmodule ConnectionProcess do
  use GenServer

  require Logger

  defstruct [:conn, requests: %{}]

  def start_link({scheme, host, port}) do
    GenServer.start_link(__MODULE__, {scheme, host, port})
  end

  def request(pid, method, path, headers, body) do
    GenServer.call(pid, {:request, method, path, headers, body})
  end

  ## Callbacks

  @impl true
  def init({scheme, host, port}) do
    case Mint.HTTP.connect(scheme, host, port) do
      {:ok, conn} ->
        state = %__MODULE__{conn: conn}
        {:ok, state}

      {:error, reason} ->
        {:stop, reason}
    end
  end

  @impl true
  def handle_call({:request, method, path, headers, body}, from, state) do
    # In both the successful case and the error case, we make sure to update the connection
    # struct in the state since the connection is an immutable data structure.
    case Mint.HTTP.request(state.conn, method, path, headers, body) do
      {:ok, conn, request_ref} ->
        state = put_in(state.conn, conn)
        # We store the caller this request belongs to and an empty map as the response.
        # The map will be filled with status code, headers, and so on.
        state = put_in(state.requests[request_ref], %{from: from, response: %{}})
        {:noreply, state}

      {:error, conn, reason} ->
        state = put_in(state.conn, conn)
        {:reply, {:error, reason}, state}
    end
  end

  @impl true
  def handle_info(message, state) do
    # We should handle the error case here as well, but we're omitting it for brevity.
    case Mint.HTTP.stream(state.conn, message) do
      :unknown ->
        _ = Logger.error(fn -> "Received unknown message: " <> inspect(message) end)
        {:noreply, state}

      {:ok, conn, responses} ->
        state = put_in(state.conn, conn)
        state = Enum.reduce(responses, state, &process_response/2)
        {:noreply, state}
    end
  end

  defp process_response({:status, request_ref, status}, state) do
    put_in(state.requests[request_ref].response[:status], status)
  end

  defp process_response({:headers, request_ref, headers}, state) do
    put_in(state.requests[request_ref].response[:headers], headers)
  end

  defp process_response({:data, request_ref, new_data}, state) do
    update_in(state.requests[request_ref].response[:data], fn data -> (data || "") <> new_data end)
  end

  # When the request is done, we use GenServer.reply/2 to reply to the caller that was
  # blocked waiting on this request.
  defp process_response({:done, request_ref}, state) do
    {%{response: response, from: from}, state} = pop_in(state.requests[request_ref])
    GenServer.reply(from, {:ok, response})
    state
  end

  # A request can also error, but we're not handling the erroneous responses for
  # brevity.
end

Main.main()
whatyouhide commented 2 years ago

Yeah maybe Shopify is using an HTTP/2 server that does something to please curl in some way? @ppatrzyk @wojtekmach did any of you try to reproduce this with other HTTP/2 clients? I won't have time to investigate this this week.

wojtekmach commented 2 years ago

Here's an example with Gun, it works fine on non-curl UA, and otherwise crashes too:

Mix.install([
  {:gun, "~> 1.3"}
])

{:ok, conn_pid} = :gun.open('www.shopify.com', 443)
{:ok, :http2} = :gun.await_up(conn_pid)

# ua = "foo"
ua = "curl/7.68.0"
stream_ref = :gun.get(conn_pid, "/", [{<<"user-agent">>, ua}])
{:response, :nofin, status, headers} = :gun.await(conn_pid, stream_ref)
{:ok, body} = :gun.await_body(conn_pid, stream_ref)
IO.inspect(status: status, headers: headers, body: body)
** (MatchError) no match of right hand side value: {:inform, 103, [{"link", "<https://cdn.shopify.com/shopifycloud/brochure/assets/application-2a87c36fb9f9ebac03085212d50f96dd0dc5b9d19bf50171392f4721846881b5.c
ss>; as=style; rel=preload, <https://cdn.shopify.com/shopifycloud/brochure/assets/manifests/home/index-698c5afefe9553361e0af3953b0b9655ed59d09f933b05f6158d911e939ea0a0.css>; as=style; rel=preload"}]}
    gun.exs:11: (file)
    (elixir 1.14.0-dev) lib/code.ex:1228: Code.require_file/2
xinz commented 2 years ago

Here's an example with Gun, it works fine on non-curl UA, and otherwise crashes too:

Mix.install([
  {:gun, "~> 1.3"}
])

{:ok, conn_pid} = :gun.open('www.shopify.com', 443)
{:ok, :http2} = :gun.await_up(conn_pid)

# ua = "foo"
ua = "curl/7.68.0"
stream_ref = :gun.get(conn_pid, "/", [{<<"user-agent">>, ua}])
{:response, :nofin, status, headers} = :gun.await(conn_pid, stream_ref)
{:ok, body} = :gun.await_body(conn_pid, stream_ref)
IO.inspect(status: status, headers: headers, body: body)
** (MatchError) no match of right hand side value: {:inform, 103, [{"link", "<https://cdn.shopify.com/shopifycloud/brochure/assets/application-2a87c36fb9f9ebac03085212d50f96dd0dc5b9d19bf50171392f4721846881b5.c
ss>; as=style; rel=preload, <https://cdn.shopify.com/shopifycloud/brochure/assets/manifests/home/index-698c5afefe9553361e0af3953b0b9655ed59d09f933b05f6158d911e939ea0a0.css>; as=style; rel=preload"}]}
    gun.exs:11: (file)
    (elixir 1.14.0-dev) lib/code.ex:1228: Code.require_file/2

With Gun ~> 2.0.0-rc.2, both "foo" and "curl/xxx" works fine, please notice that when use "curl/xxx" user agent, shopify.com will inform an 103 early hint, copy from the preview comment:

* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 103 
< link: <https://cdn.shopify.com/shopifycloud/brochure/assets/application-15065a20bcc439ed82721b3579f52386963fffcf5dc3a07b645272d0f9832fef.css>; as=style; rel=preload, <https://cdn.shopify.com/shopifycloud/brochure/assets/manifests/home/index-15283b6827a3bcadb31687c4316738bc2c978aa6775b7ccc94cf0deb8fc97081.css>; as=style; rel=preload
< HTTP/2 200 
< date: Thu, 24 Feb 2022 18:30:51 GMT
< content-type: text/html; charset=utf-8
...

Looks like Mint needs to do some enhancement when decode header for stream in http2 in this case.

ppatrzyk commented 2 years ago

this is not curl UA specific, e.g. Firefox user agent fails with the same error as well

{:ok, pid} = ConnectionProcess.start_link({:https, "www.shopify.com", 443})
response = ConnectionProcess.request(pid, "GET", "/", [{"user-agent", "Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0"}], :nil)

and as @xinz already mentioned, this works fine in other http2 libraries, I have tested with python's httpx

whatyouhide commented 2 years ago

This was a legit Mint bug with informational (1xx) responses 🙃 Fixed in #363. Thanks for the detailed report and the patience! 💟