HttpWebRequest timeout broken with proxy/ssl, repro attached

coderb commented 4 years ago

running netcoreapp3.1 under win x64.

see https://github.com/coderb/DotnetCore31TimeoutBug

i've isolated a case where HttpWebRequest hangs and does not timeout properly even though Timeout is set.

in this particular case i am tunneling SSL through a proxy (via CONNECT) and the proxy server accepts the connection, completes the SSL handshake, returns the response http headers (but does not include the Content-Length header) and then hangs.

in most other cases the client times out properly (eg if the server hangs prior to ssl handshake or if the server does return the Content-Length header.)

i found this while investigating what i thought were hangs due to ReadWriteTimeout being unsupported #21845 #28404 ( @stephentoub please hear my call ) but this seems to be a distinct bug.

i would recommend auditing the code involved in the handling of timeouts when ssl and or a proxy is used because it seems to be a potentially lesser tested code path.

here's the stack on the client when it is not timing out:

    [Managed to Native Transition]  
    System.Net.Sockets.dll!System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle handle, System.Span<byte> buffer, System.Net.Sockets.SocketFlags socketFlags, out int bytesTransferred) Unknown
    System.Net.Sockets.dll!System.Net.Sockets.Socket.Receive(byte[] buffer, int offset, int size, System.Net.Sockets.SocketFlags socketFlags, out System.Net.Sockets.SocketError errorCode) Unknown
    System.Net.Sockets.dll!System.Net.Sockets.NetworkStream.Read(byte[] buffer, int offset, int size)   Unknown
    System.Net.Http.dll!System.Net.Http.HttpConnection.ReadBuffered(System.Span<byte> destination)  Unknown
    System.Net.Http.dll!System.Net.Http.HttpConnection.RawConnectionStream.Read(System.Span<byte> buffer)   Unknown
    System.Net.Http.dll!System.Net.Http.HttpBaseStream.Read(byte[] buffer, int offset, int count)   Unknown
    System.Net.Security.dll!System.Net.Security.SslStream.FillBufferAsync<System.Net.Security.SslStream.SslReadSync>(System.Net.Security.SslStream.SslReadSync adapter, int minSize)    Unknown
    System.Net.Security.dll!System.Net.Security.SslStream.ReadAsyncInternal<System.Net.Security.SslStream.SslReadSync>(System.Net.Security.SslStream.SslReadSync adapter, System.Memory<byte> buffer)   Unknown
    System.Net.Security.dll!System.Net.Security.SslStream.Read(byte[] buffer, int offset, int count)    Unknown
    System.Private.CoreLib.dll!System.IO.Stream.Read(System.Span<byte> buffer)  Unknown
    System.Net.Http.dll!System.Net.Http.HttpConnection.Read(System.Span<byte> destination)  Unknown
    System.Net.Http.dll!System.Net.Http.HttpConnection.ConnectionCloseReadStream.Read(System.Span<byte> buffer) Unknown
>   BugClient.dll!ConsoleApp1.Program.ReadToMemoryStream(System.IO.Stream stream) Line 33   C#
    BugClient.dll!ConsoleApp1.Program.Run() Line 22 C#
    BugClient.dll!ConsoleApp1.Program.Main(string[] args) Line 13   C#

and another stack i captured somewhere else (no repro)

    [Managed to Native Transition]  
    System.Net.Sockets.dll!System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle handle, System.Span<byte> buffer, System.Net.Sockets.SocketFlags socketFlags, out int bytesTransferred) Unknown
    System.Net.Sockets.dll!System.Net.Sockets.Socket.Receive(byte[] buffer, int offset, int size, System.Net.Sockets.SocketFlags socketFlags, out System.Net.Sockets.SocketError errorCode) Unknown
    System.Net.Sockets.dll!System.Net.Sockets.NetworkStream.Read(byte[] buffer, int offset, int size)   Unknown
    System.Net.Http.dll!System.Net.Http.HttpConnection.ReadBuffered(System.Span<byte> destination) Line 2214    C#
    System.Net.Http.dll!System.Net.Http.HttpConnection.RawConnectionStream.Read(System.Span<byte> buffer) Line 848  C#
    System.Net.Http.dll!System.Net.Http.HttpBaseStream.Read(byte[] buffer, int offset, int count) Line 107  C#
    System.Net.Security.dll!System.Net.Security.SslStream.FillBufferAsync<System.Net.Security.SslStream.SslReadSync>(System.Net.Security.SslStream.SslReadSync adapter, int minSize)    Unknown
    System.Net.Security.dll!System.Net.Security.SslStream.ReadAsyncInternal<System.Net.Security.SslStream.SslReadSync>(System.Net.Security.SslStream.SslReadSync adapter, System.Memory<byte> buffer)   Unknown
    System.Net.Security.dll!System.Net.Security.SslStream.Read(byte[] buffer, int offset, int count)    Unknown
    System.Private.CoreLib.dll!System.IO.Stream.Read(System.Span<byte> buffer)  Unknown
    System.Net.Http.dll!System.Net.Http.HttpConnection.Read(System.Span<byte> destination) Line 2171    C#
    System.Net.Http.dll!System.Net.Http.HttpConnection.ContentLengthReadStream.Read(System.Span<byte> buffer) Line 535  C#
>   System.Net.Http.dll!System.Net.Http.HttpBaseStream.Read(byte[] buffer, int offset, int count) Line 107  C#

System.Net.Http.HttpClient System.Net.HttpWebReqeust

Dotnet-GitSync-Bot commented 4 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 4 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

stephentoub commented 4 years ago

BugClient.dll!ConsoleApp1.Program.ReadToMemoryStream

The call stack points to your ReadToMemoryStream method, which is only invoked in your program after GetResponse has returned and the response stream accessed. My understanding is that HttpWebRequest.Timeout is designed to apply up until GetResponse completes: https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.timeout?view=netframework-4.8 "Timeout is the number of milliseconds that a subsequent synchronous request made with the GetResponse method waits for a response" and that if you wanted GetResponse to wait for the whole response stream before returning, you'd need to set AllowReadStreamBuffering to true. Someone on @dotnet/ncl can correct me if I'm off base here.

coderb commented 4 years ago

@stephentoub it's hard to infer why the client times out in ReadToMemoryStream if the server sends a Content-Length header but there is no timeout otherwise and how this would be "per spec". But also as I understand what you are saying, the client really should hang on the subsequent stream Read due again to lack of support for ReadWriteTimeout.

Sorry, but could you remind me one more time why it is so hard to add the missing calls to set the read/write timeout on ~~NetworkStream.ReadTimeout/WriteTimeout~~ HttpConnection.socket (via ~~ioctl~~ SetSocketOption) for dotnet core?

This really is causing me major headaches. I've spent 2 weeks on a rather large solution adding async code paths but ensuring that no synchronous network reads/writes are ever done is a herculean task.

coderb commented 4 years ago

well, since it looks like this issue is set to fester, here's a workaround for those unlucky future travelers who step on this landmine:

        public Stream GetResponseStream(HttpWebResponse resp, int timeoutMillis) {
            var stream = resp.GetResponseStream();
#if NETCOREAPP3_1
            UnbuggerStream(stream, timeoutMillis);
#endif
            return stream;
        }

        private static void UnbuggerStream(Stream stream, int timeoutMillis) {
            if (stream is GZipStream stream2) {
                stream = stream2.BaseStream;
            }
            {
                var fullName = stream.GetType().FullName;
                if (fullName == "System.Net.Http.EmptyReadStream") {
                    return;
                }

                if (
                    fullName == "System.Net.Http.HttpConnection+ChunkedEncodingReadStream" || 
                    fullName == "System.Net.Http.HttpConnection+ConnectionCloseReadStream" ||
                    fullName == "System.Net.Http.HttpConnection+ContentLengthReadStream"
                ) {
                    var _connection = GetFieldNonPublic<object>(stream, "_connection");
                    if (_connection != null) {
                        var _socket = GetFieldNonPublic<Socket>(_connection, "_socket");
                        if (_socket != null) {
                            _socket.ReceiveTimeout = timeoutMillis;
                            _socket.SendTimeout = timeoutMillis;
                            return;
                        }

                        // try proxy structure
                        var _stream = GetFieldNonPublic<Stream>(_connection, "_stream");
                        var innerStream = GetFieldNonPublic<Stream>(_stream, "_innerStream");
                        if (innerStream != null) {
                            _stream = innerStream;
                        }
                        _connection = GetFieldNonPublic<object>(_stream, "_connection");
                        _socket = GetFieldNonPublic<Socket>(_connection, "_socket");
                        if (_socket != null) {
                            _socket.ReceiveTimeout = timeoutMillis;
                            _socket.SendTimeout = timeoutMillis;
                            return;
                        }
                    }
                }
            }

            // working as of netcoreapp3.1, will likely need fixes against other runtimes
            throw new NotImplementedException();
        }

        private static T GetFieldNonPublic<T>(object obj, string fieldName) {
            var fi = obj.GetType().GetField(fieldName, BindingFlags.GetField | BindingFlags.Instance | BindingFlags.NonPublic);
            var result = fi.GetValue(obj);
            return (T)result;
        }

wfurt commented 4 years ago

Is there anything left to do here @stephentoub? (besides discouraging HttpWebRequest usage)

coderb commented 4 years ago

@wfurt fix the bug that ReadWriteTimeout is not implemented or provide a logical rationale as to why it should not be?

coderb commented 4 years ago

@wfurt the problem occurs with HttpClient as well as HttpWebRequest. Perhaps you want to deprecate all synchronous i/o calls in .net since you are refusing to support an existing API that is fundamentally required for reliable synchronous networking.

wfurt commented 4 years ago

Perhaps we should improve documentation and clarify HttpClient.Timeout better. Changing scope of the Timeout to cover response stream would be breaking change IMHO. Similar issues were raised before: people want absolute timeouts or timeouts for cases when server stop sending response e.g. idle timeout. This should perhaps wait for triage.

coderb commented 4 years ago

@wfurt This is not a documentation problem. If I am downloading a 2GB file my timeout for how long the whole process takes to complete has no relation to how long of a pause between network packets I am will to tolerate. Dont reinvent the wheel. ReadWriteTimeout is a simple call on the underlying socket. I'm at a loss as to why you refuse to make it from HttpWebRequest.

karelz commented 3 years ago

Triage: Given that this is in sync paths which are lower priority, and which are very difficult to change and make work consistently across all cases (HTTP versions and platforms), we should wait for more feedback from more customers to see if it has priority among larger customer base.

coderb commented 3 years ago

@karelz what is difficult about adding socket.ReceiveTimeout = ReadWriteTimeout; socket.SendTimeout = ReadWriteTimeout?

coderb commented 3 years ago

@karelz additionally, this should be listed in the release notes as a "Known Issue" if you are looking for feedback. this bug is very easy to miss because it may happen rarely, could be "fixed" by an application restart, and is difficult to diagnose.

coderb commented 3 years ago

@karelz if you genuinely want developer "feedback" to see how many people are relying on ReadWriteTimeout how about int ReadWriteTimeout { set { throw new InvalidOperationException(); } }. yeah, uh huh.... you don't. let's just disregard the api silently instead and see how many developer sleuths can figure out why their app is unreliable.

karelz commented 3 years ago

@coderb based on our triage discussion, there are numerous difficulties on Linux to implement this - my key takeaway was that sockets are rather tricky in this area. Regarding throwing -- we need to consider also the case of devs who never hit the timeout and their apps just happily work today. With throwing we would break them. Or perhaps even libraries that set it on behalf of apps.

While not ideal, without clear way forward and lower customer impact (only 1 report so far), we don't plan to prioritize it now. We will keep the issue open to reconsider in future and see if there are more upvotes / customers hitting the problem.

coderb commented 3 years ago

@karelz

@coderb based on our triage discussion, there are numerous difficulties on Linux to implement this - my key takeaway was that sockets are rather tricky in this area.

Why is this os dependent? socket.SendTime/ReceiveTimeout are direct o/s calls to setsockopt afaik.

Regarding throwing -- we need to consider also the case of devs who never hit the timeout and their apps just happily work today. With throwing we would break them. Or perhaps even libraries that set it on behalf of apps.

I think you have a fundamental misunderstanding of what the issue is. There are NO devs who never hit the timeout.
Whether the timeout is hit is dependent on the server response and network health. There are only devs who "haven't hit the timeout yet" or "don't realize that their app has hung".

While not ideal, without clear way forward and lower customer impact (only 1 report so far), we don't plan to prioritize it now. We will keep the issue open to reconsider in future and see if there are more upvotes / customers hitting the problem.

There are WAY more than 1 report. This issue has been reported and marked closed many times in the past. Don't sweep it under the rug. .NET 5.0/6.0 is supposed to be complete/production quality. This is not an acceptable bug for a production system.

stephentoub commented 3 years ago

socket.SendTime/ReceiveTimeout are direct o/s calls to setsockopt afaik.

Not exactly. Berkeley sockets can operate in one of two modes: blocking and non-blocking. Send/receive timeouts apply only to blocking (since non-blocking operations return immediately, either having completed the requested operation or with an error indicating it can't be completed now, try again later). In order to implement async operations, the socket must be non-blocking, which means we need to force that on the underlying socket, and in order to avoid problematic race conditions, once the socket has been configured as non-blocking, we can't put it back to be blocking, which means we then need to simulate blocking by effectively doing sync-over-async (albeit in a way that avoids requiring additional thread pool threads be involved), with an event that gets signaled when the async operation completes and blocking on that event. We then try to simulate send/receive timeouts as well, by passing the corresponding timeouts to the waits on those events and canceling the operation after that timeout has expired, but this is by no means perfect.

Even if our simulation is good enough, another OS-specific aspect is that, for example, Linux doesn't roundtrip the values particular well, e.g. it may double the set value such that reading it back later leads to getting a different value than was set, it may cap the value such that reading it back leads to a different value than was set, etc. This can then lead to issues for connection pooling, as if we need to read the original value that was set, store it, and then reset it to that original value before returning it to the pool, we could end up leaving it in a different configured state. Now, we try to work around this as well by having the Socket retain a shadow copy of the values as they were set via Socket's public properties, but that also then means it's disconnected from what the reality is potentially, e.g. if the values were set via some other means.

But, let's say that either those aren't a concern or we're willing to live with the corner cases. HttpWebRequest (which the docs explicitly call out as not being recommended anymore) is layered on top of HttpClient in .NET Core. For HttpWebRequest to support timeouts, that means the streams employed by HttpClient would also need to support timeouts. They do not: on both request and response streams, CanTimeout is false and trying to get or set ReadTimeout or WriteTimeout will result in an exception. We could change that, but only for some cases. For HTTP/2, for example, individual requests are multiplexed onto a single connection, which means one stream setting a timeout and having that translate into setting a timeout on the underlying socket would end up affecting all multiplexed operations, not just the one; in fact, this makes it challenging to effectively support synchronous operations at all. So then even if we made this work for HTTP/1.x, whether the response streams support timeouts would depend on the HTTP version actually employed for a given connection.

But, let's say we were ok with that inconsistency. All of the request response streams would then need to track changes made to the timeouts so they could be reset prior to the connection being put back into the pool. That's not "pay for play".

But, let's say we were ok with that, since we'd be talking making those stream objects just a few bytes larger and incurring potentially just a few more syscalls per request. There's still a gap here, in that it would only apply to the sending of the request body or the receiving of the response body, not the sending of request headers or the reading of response headers, since it would only apply to the portion that was controlled by the request or response streams, which the calling code (HttpWebRequest) wouldn't have access to in time to set it for the request headers or response headers.

So, even if we did all the plumbing to flow ReadWriteTimeout to the request/response streams's Read/WriteTimeouts and from there to the underlying connection and appropriately tracking the original value and resetting it and accepting whatever other deficiencies come with it, it still wouldn't address your overall gripe, that all individual read/write operations performed as part of the request/response have a timeout applied to them individually.

Now, late in .NET 5, we did add a ConnectCallback to SocketsHttpHandler. That gives the owner of the handler the ability to create/control the underlying stream used by the handler. So, for example, instead of HttpWebRequest using HttpClientHandler, it could use SocketsHttpHandler directly and supply a ConnectCallback delegate that would create the socket and configure it with the desired timeouts. It would also configure the handler to not pool, such that every single request/response would get its own connection that was torn down at the end of the connection (this is already the case for any HttpWebRequest that doesn't use defaults for a bunch of properties), avoiding concerns around different values between web requests impacting other completely unrelated requests. In theory, this is feasible and not a ton of work. However, in .NET 5 the ConnectCallback explicitly prohibits it being used for synchronous operations, meaning to employ this approach, we would either need to change HttpWebRequest's synchronous methods back to doing sync-over-async (something we were happy to ditch in .NET 5 by using HttpClient's new Send method), or we would need to remove that limitation of ConnectCallback and accept the consequences (e.g. instead of throwing an exception if a sync Send is performed with a ConnectCallback specified, throw an exception if a sync Send is performed and a ConnectCallback doesn't complete synchronously). And if we did that, we'd also need new API that would enable SocketsHttpHandler to inform the ConnectCallback whether it should be initiating the connect synchronously or asynchronously.

So, is there a possible solution? Sure. But it's not nearly as simple as "adding socket.ReceiveTimeout = ReadWriteTimeout; socket.SendTimeout = ReadWriteTimeout?"; if it were, we wouldn't be having this discussion, as it would have been done years ago.

karelz commented 3 years ago

Triage: If we ever decide to fix it, we should use approach laid out in #44876

There are thoughts about alternative solutions by having multiple cached HttpClients (one for sync and one for async)

dotnet / runtime

HttpWebRequest timeout broken with proxy/ssl, repro attached #43520