drbrain / net-http-persistent

Thread-safe persistent connections with Net::HTTP
http://seattlerb.rubyforge.org/net-http-persistent
339 stars 117 forks source link

Response body different from what it should be #46

Closed markoudev closed 11 years ago

markoudev commented 11 years ago

We introduced net-http-persistent in our application in order to get rid of the HTTP overhead, since we do a lot of them (~30 requests per second). We perform requests that return an XML document which are then parsed. Our request are using different hosts in the HTTP header, but are targeted at the same IP-address.

However, we've seen some disturbing behaviour where the response XML of a request was completely different from what we expected it to be. In fact, it was the response of a completely different request. It sure sounds like something that may have to do with threading when a lot of requests are being performed.

We're using version 1bf9703b of net-http-persistent and ruby 1.8.7 using passenger 4.0.5. Any ideas to what this could be?

drbrain commented 11 years ago

net-http-persistent does not share connections across threads so this should not happen in ordinary use.

I would edit Net::HTTP::Persistent#request to log "thread #{Thread.current.object_id} connection #{connection.object_id} request #{request.object_id} #{request.request_uri}" both after the connection is retrieved and just before the response is returned. Seeing the same connection used by two threads is a sign of something bad happening.

PS: I recommend only using released versions. I suppose I need to release net-http-persistent, though.

markoudev commented 11 years ago

Our application servers are single threaded, so that shouldn't be a problem. I took the app up with the log line you proposed, and received an error that told me the response body was something else from what we expected. This is what I saw in our logs.

The response XML is not what we expected it to be. First of all, it should start with an XML header and somehting about presenceLogs, instead of activities. The thread, connection and request object ids look fine. The request object id is unique throughout the log file. Any thoughts?

POST DATA: <presenceLogs type="array">...</presenceLogs>
POST https://xxx/presence_logs.xml
[net-http-persistent] thread 7663920 connection 56664240 request 53920900 uri https://xxx/presence_logs.xml
[net-http-persistent] thread 7663920 connection 56664240 request 53920900 uri https://xxx/presence_logs.xml
--> 200 OK 0.07KB (5.1ms)
"<profileActivity type=\"array\"></profileActivity>\n"
drbrain commented 11 years ago

Since it's single threaded the problem must not be where I thought.

How do you construct and dispatch your requests? Is it something like this:

http = Net::HTTP::Persistent.new
uri = URI 'https://192.0.2.1/presence_logs.xml'

req1 = Net::HTTP::Post.new uri.path
req1['Host'] = 'host1.example'
# […]

http.request uri, req1

req2 = Net::HTTP::Post.new uri.path
req2['Host'] = 'host2.example'
# […]

http.request uri, req2

Most specifically, I'm interested in know if you're using the same URI for requests to different hosts. There may be a problem here I hadn't thought of before.

markoudev commented 11 years ago

We use a gem called Ahora, which glues together Faraday and Nibbler, to parse our requests. This also takes care of the connection, but allows for customisation in the implementing application (see this line).

In our application we have a file that takes care of the connection configuration, which looks like this:

require 'faraday_middleware'
require 'faraday/adapter/net_http_persistent'

def extend_middleware(builder)
  builder.use DnsResolverMiddleware, dns_resolver # This was created using DnsResolver.new
end

DnsResolverMiddleware = Struct.new(:app, :resolver) do
  def call env
    host = env[:url].host
    env[:request_headers][:host] = host
    env[:url].host = resolver.resolve host
    app.call env
  end
end

So now we just use Faraday to handle the connections. What problem could there be do you think?

Thanks for helping out!

mlangenberg commented 11 years ago

As we dispatch requests to almost 500 different backend servers (https://tenant-1.example.com/ to https://tenant-500.example.com/), which are exposed through only a handful of different IP's, before making the request, we resolve the ip-address.

This optimization allows us to use less than 10 persistent http connections instead of potentially 500 per app server (per passenger worker).

We are not re-using the actual URI instance object, if that is what you are referring to. We do however connect to the same uri, https://ip-address/, with the header Host: tenant-1.example.com set.

drbrain commented 11 years ago

Since you are connecting to the same URI I have a hypothesis:

  1. Send request to http://192.0.2.1/presence_logs.xml (inside net-http-persistent)
  2. Timeout or other exception (inside net/http the socket remains open so the request is outstanding)
  3. Server sends response which the OS buffers
  4. New request by user which uses the same connection
  5. net-http-persitent reads the response to the first request (as the persistent connection was reused)

net-http-persistent did not rescue all exceptions and close the connection, so can you try @723a00e

markoudev commented 11 years ago

Your hypothesis seemed to be correct. We took it up on our application servers and kept monitoring both performance and errors. The average response time for each request decreased approximately 40-50ms using persistent connections, and we haven't seen any errors concerning different response bodies like we did before. Hopefully this is empirical proof that the hypothesis and fix are correct. Thanks!

drbrain commented 11 years ago

Great, I will release a new version shortly.