fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.26k stars 618 forks source link

Fabio hangs for 30+ seconds for 204 response #206

Closed sadyou closed 7 years ago

sadyou commented 7 years ago

On receiving a '204 No Content' fabio is still assuming that a response will be sent and keeps waiting for the response. The desired behavior is for fabio to NOT listen on any response despite the HTTP connection being persistent or NOT and simply return the 204 No Content to the original consumer. We have tested this scenario with a number of other applications such as POSTMAN, SOAP-UI, etc and the correct behavior is despite the connection being persistent or not, on 204 reception the consumer application should not listen for anything else coming on the stream. A 204 is NO CONTENT. What happens in fabio is that it continues to listen for a response until the idle connection timeout is exhausted.

magiconair commented 7 years ago

This sounds like #120 which even has the same title.

From https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

10.2.5 204 No Content

The server has fulfilled the request but does not need to return an entity-
body, and might want to return updated metainformation. The response MAY
include new or updated metainformation in the form of entity-headers, which if
present SHOULD be associated with the requested variant.

If the client is a user agent, it SHOULD NOT change its document view from
that which caused the request to be sent. This response is primarily intended
to allow input for actions to take place without causing a change to the user
agent's active document view, although any new or updated metainformation
SHOULD be applied to the document currently in the user agent's active view.

The 204 response MUST NOT include a message-body, and thus is always
terminated by the first empty line after the header fields.

Can you get and post the full trace of the inbound and outbound requests and responses? I'd like to see whether the response contains a Content-Length: 0 header and has a terminating empty line and/or if the upstream server is closing the connection. I'll do some testing on my end.

magiconair commented 7 years ago

I've updated the demo server to return response with an arbitrary status code. You can run it as follows:

# start consul server (sep terminal)
consul agent -dev 

# start demo server (sep terminal)
cd $GOPATH/src/github.com/eBay/fabio/demo/server
go build
./server -prefix /foo -status 204

# start fabio (sep terminal)
cd $GOPATH/src/github.com/eBay/fabio
go build
./fabio

# query demo server directly and through fabio
curl -i 127.0.0.1:5000/foo # direct
curl -i 127.0.0.1:9999/foo # via fabio

The demo server will not return any content when run with -status 204. You can try that by using a different status code, e.g. 200 or 203. When querying the /foo endpoint both directly and via fabio I get the same result as in immediate response without hanging. Therefore, I think that there is something specific about your configuration that triggers that behavior.

sadyou commented 7 years ago

Are you sure that your demo server is using a persistent connection? if the upstream server is not using persistent connections then fabio will not wait for any response on receiving a 204 No content. This problem is seen within fabio when the server has persistent connection enabled.

sadyou commented 7 years ago

I have confirmed that no Content-Length header is sent when 204 No Content is returned. This is correct behavior as per According to section 3.3.2 of RFC 7230 (Proposed Standard, RFC 7230):

A server MUST NOT send a Content-Length header field in any response with a status code of 1xx (Informational) or 204 (No Content).

I have turned verbose on for a curl invocation and the problem is in bold below

* Connected to localhost (127.0.0.1) port 8089 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.40.0-DEV
> Host: localhost:8089
> Accept: */*
> Content-Length: 3
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 3 out of 3 bytes
< HTTP/1.1 204
< Content-Type: application/x-www-form-urlencoded; charset=windows-1252
< MULE_ENCODING: windows-1252
< Date: Wed, 21 Dec 2016 11:08:23 GMT
<
_*##  Connection #0 to host localhost left intact_

The line above is where the server has persistent connection enabled. Fabio waits for a response and will only stop when the connection idle timeout is exhausted. If I change the server config and set the persistent connection as OFF (disabled) then fabio behaves correctly

magiconair commented 7 years ago

Found https://github.com/golang/go/issues/15647 which seems related.

I'm curious: does fabio forward the 204 response and keep the connection open or does it wait 30 sec before forwarding the 204 response?

sadyou commented 7 years ago

It waits 30 seconds before forwarding the 204 response. It waits for whatever value I specify as the idle timeout. 30 seconds is our default but if I reduce it to 5 seconds then it will wait for 5 seconds before sending the 204.

magiconair commented 7 years ago

Here is a simple raw web server which only serves 204 responses and keeps the connection open after sending the response. When the response contains an empty line then fabio does what you want. If not, then it hangs. To simulate the hang remove the second \r\n from the body var.

package main

import (
    "log"
    "net"
    "time"
)

var (
    addr    = ":8080"
    timeout = 10 * time.Second
    body    = []byte("HTTP/1.1 204 No Content\r\n\r\n")
)

func main() {
    log.Println("Serving 204 responses on ", addr)
    l, err := net.Listen("tcp", addr)
    if err != nil {
        log.Fatal("listen: ", err)
    }

    for {
        c, err := l.Accept()
        if err != nil {
            log.Fatal("accept: ", err)
        }

        go func(c net.Conn) {
            defer c.Close()

            log.Printf("sending %q", string(body))
            if _, err := c.Write(body); err != nil {
                log.Fatal("write: ", err)
            }

            log.Println("waiting", timeout)
            time.Sleep(timeout)
            log.Println("closing")
        }(c)
    }
    select {}
}

Test with

# run server (sep terminal)
go run main.go

# run fabio (sep terminal)
./fabio -registry.static.routes 'route add svc / http://127.0.0.1:8080' -registry.backend static -proxy.keepalivetimeout 5s

# make request
curl -i -v http://127.0.0.1:9999/

Could you please capture a tcpdump of the roundtrip between fabio and mule? I'm suspicious that mule is not sending the response it should. Also, because this is the second time someone stumbled over exactly the same problem.

sadyou commented 7 years ago

This issue is with a MuleSoft interaction as well: Below is a dump of the response:

Thu Dec 22 09:14:39 EST 2016:DEBUG:<< "HTTP/1.1 204 [\r][\n]"
Thu Dec 22 09:14:39 EST 2016:DEBUG:<< "Content-Type: text/xml; charset=UTF-8[\r][\n]"
Thu Dec 22 09:14:39 EST 2016:DEBUG:<< "MULE_ENCODING: UTF-8[\r][\n]"
Thu Dec 22 09:14:39 EST 2016:DEBUG:<< "Date: Wed, 21 Dec 2016 22:14:39 GMT[\r][\n]"
Thu Dec 22 09:14:39 EST 2016:DEBUG:<< "[\r][\n]"

You will notice the /r/n is there but I notice in your sample you have two occurrences of this. Why should there be two?

magiconair commented 7 years ago

The blank line separates the headers from the body. It effectively terminates the header section. Without that the server can't know whether it has seen all the headers or not.

sadyou commented 7 years ago

Are you saying that the dump that I have provided will cause fabio to hang until the idle timeout is exhausted?

magiconair commented 7 years ago

What I'm saying is that without the blank line after the headers fabio is waiting for more content since the header section hasn't been properly terminated yet. AFAIR, this is part of the HTTP protocol. I'll dig some more but this looks like a mule bug. Obviously you won't see this if mule closes the connection.

magiconair commented 7 years ago

https://tools.ietf.org/html/rfc2616#section-4.1 defines an HTTP message as

        generic-message = start-line
                          *(message-header CRLF)
                          CRLF
                          [ message-body ]
        start-line      = Request-Line | Status-Line

which means that there must be a blank line after the message headers. Since mule does not seem to send the final CRLF fabio waits for more headers until it times out. So to that extent I'd consider this a mule bug. Maybe dig through their issues to see whether someone else has reported it.

sadyou commented 7 years ago

I have raised a support ticket with Mule in parallel and my support team in MuleSoft is across this dialogue as well. Their argument at this stage is that any other client whether it be POSTMAN, SOAP-UI, curl, tibco,etc seem to behave differently to fabio at this stage. I will do some more digging on my side as well.

magiconair commented 7 years ago

If they don't send the empty line then they are not behaving according to spec IMO and the other clients are more forgiving. My guess is that any Go based reverse proxy will show this problem.

sadyou commented 7 years ago

I have attached below the raw tcpdump of the request and response

504f5354202f20485454502f312e310d0a486f73743a203132372e302e302e313a383038390d0a557365722d4167656e743a206375726c2f372e35312e300d0a4163636570743a202a2f2a0d0a436f6e74656e742d4c656e6774683a20330d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a323034

485454502f312e3120323034200d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465643b20636861727365743d5554462d380d0a4d554c455f454e434f44494e473a205554462d380d0a446174653a205468752c2032322044656320323031362032323a34383a313020474d540d0a0d0a

magiconair commented 7 years ago

@sadyou This is just the hex content and not a base64 encoded tcpdump capture, right?

sadyou commented 7 years ago

The above is the raw content. I have also attached below the UTF-8 content from wireshark

POST / HTTP/1.1 Host: 127.0.0.1:8089 User-Agent: curl/7.51.0 Accept: / Content-Length: 3 Content-Type: application/x-www-form-urlencoded

204HTTP/1.1 204 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 MULE_ENCODING: UTF-8 Date: Thu, 22 Dec 2016 22:48:10 GMT

END

magiconair commented 7 years ago

The dump of the response you've provided contains the final CRLF and that works just fine. Either this isn't the content that is being sent to fabio or something else is going on. You can try it by replacing the body variable with this:

   // body    = []byte("HTTP/1.1 204 No Content\r\n\r\n")
    body = []byte{
        0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x31,
        0x20, 0x32, 0x30, 0x34, 0x20, 0x0d, 0x0a, 0x43,
        0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x54,
        0x79, 0x70, 0x65, 0x3a, 0x20, 0x61, 0x70, 0x70,
        0x6c, 0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e,
        0x2f, 0x78, 0x2d, 0x77, 0x77, 0x77, 0x2d, 0x66,
        0x6f, 0x72, 0x6d, 0x2d, 0x75, 0x72, 0x6c, 0x65,
        0x6e, 0x63, 0x6f, 0x64, 0x65, 0x64, 0x3b, 0x20,
        0x63, 0x68, 0x61, 0x72, 0x73, 0x65, 0x74, 0x3d,
        0x55, 0x54, 0x46, 0x2d, 0x38, 0x0d, 0x0a, 0x4d,
        0x55, 0x4c, 0x45, 0x5f, 0x45, 0x4e, 0x43, 0x4f,
        0x44, 0x49, 0x4e, 0x47, 0x3a, 0x20, 0x55, 0x54,
        0x46, 0x2d, 0x38, 0x0d, 0x0a, 0x44, 0x61, 0x74,
        0x65, 0x3a, 0x20, 0x54, 0x68, 0x75, 0x2c, 0x20,
        0x32, 0x32, 0x20, 0x44, 0x65, 0x63, 0x20, 0x32,
        0x30, 0x31, 0x36, 0x20, 0x32, 0x32, 0x3a, 0x34,
        0x38, 0x3a, 0x31, 0x30, 0x20, 0x47, 0x4d, 0x54,
        0x0d, 0x0a, 0x0d, 0x0a,
    }
sadyou commented 7 years ago

This issue can be closed now. We have identified the problem is with Mule. In the scenario Client->Fabio->Mule everything is OK when Mule sends 204 back and the new line character is there. In the scenario Client->Fabio->Mule-Mule, the second Mule listener will send the 204 and the first Mule listener will copy the 204 status back to Fabio but it omits the new line.

magiconair commented 7 years ago

@sadyou That's good to know. Do I understand this correctly that this is a bug in Mule when Mule gets a 204 response from another Mule instance?