elixir-mint / mint

Functional HTTP client for Elixir with support for HTTP/1 and HTTP/2 šŸŒ±
Apache License 2.0
1.36k stars 106 forks source link

`:ssl_closed` unknown messages on `www.apple.com` #419

Closed 1player closed 6 months ago

1player commented 6 months ago

I'm writing a web crawler that opens a connection per host with Mint. I'm noticing that, on www.apple.com, Mint.HTTP.stream often fails with incoming {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.37>, :tls_connection, :undefined}, [#PID<0.626.0>, #PID<0.625.0>]}} messages.

I'm wondering if these messages should not be handled by Mint. Also, do these messages mean that the connection is now invalid, and I should close and reopen it?

BTW, I'm using Mint in HTTP1-only mode.

1player commented 6 months ago

After more reading, it seems that, if multiple connections are open, it is possible to receive :ssl_closed messages that are referring to another process, and they're benign and can be safely ignored. Can you please confirm that's the case?

whatyouhide commented 6 months ago

Hi @1player, thanks for the issue! šŸ™

if multiple connections are open, it is possible to receive :ssl_closed messages that are referring to another process

These messages are not supposed to cross the boundary of processes, no, unless you mess with Mint.HTTP.controlling_process/2.

When you say that Mint.HTTP.stream/2 is failing with the {:ssl_closed, ...} error, what do you mean exactly by failing? What is stream/2 returning?

1player commented 6 months ago

No, I don't touch Mint.HTTP.controlling_process/2 anywhere.

By failing I mean stream/2 returns :unknown, and the message usually is {:ssl_closed, ...}. I would love to give you a way to reproduce this, but it doesn't seem to be easy to: I'm using Mint on a web crawler, and this kind of error happens only after a few dozen requests on www.apple.com servers. Never seen it anywhere on any other host, and is very timing dependent. I've added a 100ms delay between requests and that seems to lower the amount of :unknown :ssl_closed messages on Apple's webserver.

By design my crawler has a single connection per host:port pair, so there would is only one GenServer/Mint connection to www.apple.com:443 on the entire system.

It's worth pointing out that www.apple.com is one of those webservers that do not respect the Connection: keep-alive mode that's default in HTTP/1.1, so often the connection is closed halfway during the second request (first request terminated, connection closes on us after the headers of the second request were already written out by Mint). It is probably because of Apple having a weird, non-conformant web server, but I was wondering if it might indicate a bug elsewhere. I tried reproducing it with cURL, but it never made any mention of the SSL connection being closed under its nose.

whatyouhide commented 6 months ago

@1player one thing that Iā€™m thinking is this: is there a possibility that you get multiple of these {:ssl_closed, ...} messages in that same GenServer that holds the Mint connection?

If that were the case, you'd probably end up in a situation where the first :ssl_closed message gets "consumed" correctly by stream/2. That closes the socket. Then, subsequent :ssl_closed messages would be marked as :unknown because now the Mint connection has no socket to match messages against anymore.

Maybe you can try logging any message you get to the GenServer + changes you make to the Mint conn to see if this is what's going on?

whatyouhide commented 6 months ago

@1player ping?

1player commented 6 months ago

So, I tried logging all :ssl_closed messages coming in, and whether Mint ingests them correctly (returning :ok or :error) or not (returning :unknown). It seems that most :ssl_closed messages are not duplicate as you originally suggested, but arrive in the middle of regular message flow.

Below is the log, isolating all requests to www.apple.com, with some hand written notes to follow along.

[info] https://www.apple.com: Written HEAD request to /    <-- Mint.HTTP.request was successful
[info] [200 text/html] HEAD https://www.apple.com/      <-- We got a full response
[info] https://www.apple.com: Written HEAD request to /us/shop/goto/store
[info] [303 application/octet-stream] HEAD https://www.apple.com/us/shop/goto/store
[error] https://www.apple.com: First failure writing HEAD request to /us/shop/goto/buy_mac: %Mint.TransportError{reason: :closed}
[info] https://www.apple.com: Written HEAD request to /us/shop/goto/buy_mac    <-- After reconnecting, the above request went through
[error] https://www.apple.com: Mint.HTTP.stream returned :unknown: {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.20>, :tls_connection, :undefined}, [#PID<0.527.0>, #PID<0.526.0>]}}
[info] [303 application/octet-stream] HEAD https://www.apple.com/us/shop/goto/buy_mac
[error] https://www.apple.com: First failure writing HEAD request to /us/shop/goto/buy_ipad: %Mint.TransportError{reason: :closed}
[info] https://www.apple.com: Written HEAD request to /us/shop/goto/buy_ipad
[error] https://www.apple.com: Mint.HTTP.stream returned :unknown: {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.23>, :tls_connection, :undefined}, [#PID<0.536.0>, #PID<0.535.0>]}}
[info] [303 application/octet-stream] HEAD https://www.apple.com/us/shop/goto/buy_ipad
[error] https://www.apple.com: First failure writing HEAD request to /us/shop/goto/buy_iphone: %Mint.TransportError{reason: :closed}
[info] https://www.apple.com: Written HEAD request to /us/shop/goto/buy_iphone
[error] https://www.apple.com: Mint.HTTP.stream returned :unknown: {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.26>, :tls_connection, :undefined}, [#PID<0.540.0>, #PID<0.539.0>]}}
[info] [303 application/octet-stream] HEAD https://www.apple.com/us/shop/goto/buy_iphone
[error] https://www.apple.com: First failure writing HEAD request to /us/shop/goto/buy_watch: %Mint.TransportError{reason: :closed}
[info] https://www.apple.com: Written HEAD request to /us/shop/goto/buy_watch
[error] https://www.apple.com: Mint.HTTP.stream returned :unknown: {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.29>, :tls_connection, :undefined}, [#PID<0.544.0>, #PID<0.543.0>]}}
[info] [303 application/octet-stream] HEAD https://www.apple.com/us/shop/goto/buy_watch

As you can see, there are a ton of request failures because the socket closes on us (so I have to reconnect, and try again, which usually works, and the :ssl_closed messages come AFTER the request has been written. This behaviour of course applies only to www.apple.com, I have no issue whatsoever with any other server, which seems to indicate a very non-compliant server.

This makes me wonder though: each connection process has its own internal request queue, and how it works is that as soon as a request completes, I immediately start to send the next one. As GenServers are basically a single-thread, I am wondering if it is possible that between two requests there might be a :ssl_closed or other valid message in the GenServer mailbox yet to be processed. Perhaps I should yield control, to get a chance for the GenServer to catch up with its queued Mint/TCP/SSL messages before starting to write the next request. Does it make any sense?

1player commented 6 months ago

Yes, my theory was correct: I was not giving enough time and space for the GenServer to process pending messages between requests. Replacing any call to process the next in queue with an asynchronous message (send(self(), :process_next)) has reduced this issue to a minimum. I'm still seeing it every 20 requests or so, so perhaps the fix isn't perfect just yet, but it is certainly a bug in my code, and not on Mint.

Thanks for the help!