fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.27k stars 616 forks source link

Add HTTP/2 support for gRPC #244

Open cosaques opened 7 years ago

cosaques commented 7 years ago

Is it possible to plug some distributed tracing system (like Zipkin) directly to fabio (like for Linkerd) ? If not how it is better to implement this ?

magiconair commented 7 years ago

What are you trying to achieve?

cosaques commented 7 years ago

Hi @magiconair ! Our system will consist of microservices communicating via gRPC with each other. We use Nomad to run them and fabio for load balancing. Now we try to achieve the distributed tracing (trace the processes that passes through different microservices). To adapt it I see 2 solutions :

  1. manually add the code that traces inside each microservice (using Zipkin for example)
  2. switch to Linkerd that allows usage of Zipkin inside of it without need to add some code in microservices. I would like to know how if it is possible to achieve the same result (solution 2) using fabio. Thanks in advance for any help and sorry if I put my question in inappropriate place
magiconair commented 7 years ago

If I understand you correctly, you want fabio to start, collect and report the spans for your microservices. This should be zipkin protocol compliant with gRPC as your transport, right?

This sounds like an interesting idea. Do you want to volunteer implementing this?

cosaques commented 7 years ago

@magiconair , yes it's exactly this! I would like to try to implement this. Could you just give some advises how it is better to do it ? (coding always on C#, I'm not yet really strong in Go :) ) Also the question about gRPC. Do you plan to continue to support gRPC in fabio ?

magiconair commented 7 years ago

You already have metrics in fabio which report the throughput and latency per endpoint. Your services could just announce multiple endpoints to get different metrics. I'm not a gRPC expert but I assume that you can use different transport protocols for it. Are you exposing gRPC on the outside then?

If you move the zipkin layer into your frontend services then you could also route via the raw tcp service that I've just added to the 1.4 release branch.

magiconair commented 7 years ago

As for supporting gRPC: fabio can route HTTP and HTTPS and soon TCP. If gRPC can work with these transport protocols then it is already supported. What else would you need?

cosaques commented 7 years ago

@magiconair thanks a ton for these inputs. Concerning gRPC, it uses HTTP/2. As I saw the check-in with the comment "Re-enable Http/2" I wonder what is the future plans for supporting http/2 ? Thanks again

magiconair commented 7 years ago

The re-enable http/2 patch was a regression. HTTP/2 has been supported for incoming connections for quite some time but fabio still lacks the HTTP/2 outbound connections which is probably required for gRPC. This is something that I have on my plate for the next weeks together with the HTTPS support for upstream servers. The refactoring in the 1.4 branch paves the way for that.

However, one of the issues with the typical HTTP/2 and HTTPS upstream support is the re-encryption of traffic which uses CPU resources and is essentially a man-in-the-middle attack on your data stream by breaking end-to-end encryption.

Fabio has a TCP+SNI proxy mode where it routes the encrypted connection via the server name in the TLS connection without decryption (e.g. urlprefix-user.app.com/). You can't do path based routing this way but depending on your application that may be enough.

wv-myoung commented 7 years ago

@magiconair was there any progress towards an HTTP/2 supported upstream that you mentioned with the HTTPS upstream support? I only am seeing activity related to the HTTPS work but not HTTP/2.

I am also investigating load balancing options for gRPC and Fabio looks like a potentially great fit to the overall design short of the need on the HTTP/2 upstream.

I've tested and validated load balancing functionality with nghttpx but that does not have the consul driven configuration of fabio.

I was able to use fabio with the tcp load balancing but that does not balance per request for gRPC since the requests are multiplexed inside of the connection. It is only balancing on connection initiation.

jippi commented 7 years ago

A gentle bump to this issue, any 2017 / 2018 timeframe @magiconair ? :)

magiconair commented 7 years ago

From reading the docs this should just work out of the box. @jippi did you test that this doesn't work? I'll have a look this week.

wv-myoung commented 7 years ago

I have been looking into this and yes it may work out of the box when the http/1.1 to http/2 upgrade negotitation is handled for normal browser and web servers. This is not the case with gRPC as it is only an http/2 protocol on client and server it is acting strictly on http/2.

I added an additional proto value of h2 to indicate the upstream should be explicitly http/2 and used an appropriate transport for that.

The other part is that gRPC uses trailers and the handling of trailers in the core library ReverseProxy was not compatible with that for http/2 purposes. This has now been addressed but did not appear to be in 1.8.3. See issue https://github.com/golang/go/issues/20437 and commit https://github.com/golang/go/commit/1611839b29a5447f3132e93f14c75b1639a34490 addressing it.

gRPC will fail without the grpc-message and grpc-status trailers passed back with the response and there are others that may also be passed.

I am planning to contribute my changes but as we have not done so previously I need to go through steps to make sure I have clearance to do so. I can discuss it further in the meantime.

magiconair commented 7 years ago

@wv-myoung I can easily fork the net/http/httputil package for the moment until 1.9 is out and add the change if that's all it takes. I think I need to add something to the TLSClientConfig as well. I've tried to simulate this with the demo/server but didn't succeed yet. It might be worthwhile to add a gRPC mode there as well.

The main question for me is whether this can work on a regular https listener with auto-detection or whether we need a separate proto=grpc listener?

wv-myoung commented 7 years ago

@magiconair I did not have to manipulate the listener at all and do not think there would be a need for a dedicated type. The routing works well as the service name, class and action go into the uri such as /TestServer.HelloWorld/SayHello. Therefore port and prefix matching are both capable for the routing, I do believe host was working as well.

TLSClientConfig worked fine with the existing settings for handling TLSSkipVerify. I do want to further explore unsecured gRPC which I hadn't yet get to function through fabio but does work client to server directly. The http/2 transport has an option for AllowHTTP but it didn't seem to help and comment states it doesn't enable h2c so not sure if it is the correct approach.

wv-myoung commented 7 years ago

I was attempting to treat it generically as http/2 instead of specifically gRPC. The trailers was the only thing I had to do gRPC specific and the net/http/util fix will address that.

magiconair commented 7 years ago

ok, that's good to know. Then I'll add a gRPC test to the integration test and see where that fails.

jippi commented 7 years ago

amazing @magiconair & @wv-myoung !

magiconair commented 7 years ago

@wv-myoung and @jippi I've pushed a change with a test. I think the main catch for enabling http2 is to call http2.ConfigureTransport in main.go since I didn't have to change anything in the proxy code itself to make this work.

I've created a gRPC service and server which I'm testing in an integration test which fails with go1.8 but works with gotip. I'd like to extend the demo/server to serve gRPC and write a simple gRPC client for that server to test this end-to-end since the code in the integration test does not use the makeTransport function which is used in main.go.

$ ~/go1.8.3/bin/go test -v -run GRPC github.com/fabiolb/fabio/proxy
=== RUN   TestGRPC
2017/06/14 13:08:57 gRPC echo server listening on  https://127.0.0.1:60710/echosvc.EchoSvc
2017/06/14 13:08:57 http/2.0 server listening on  https://127.0.0.1:60711
=== RUN   TestGRPC/https_direct
=== RUN   TestGRPC/https_via_proxy
=== RUN   TestGRPC/gRPC_direct
=== RUN   TestGRPC/gRPC_via_proxy
--- FAIL: TestGRPC (0.01s)
    --- PASS: TestGRPC/https_direct (0.00s)
    --- PASS: TestGRPC/https_via_proxy (0.00s)
    --- PASS: TestGRPC/gRPC_direct (0.00s)
    --- FAIL: TestGRPC/gRPC_via_proxy (0.00s)
        grpc_integration_test.go:197: echosvc.Send failed:  rpc error: code = Internal desc = server closed the stream without sending trailers
FAIL
2017/06/14 13:08:57 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:60710->127.0.0.1:60717: use of closed network connection
exit status 1
FAIL    github.com/fabiolb/fabio/proxy  0.027s

$ ~/gotip/bin/go version
go version devel +17ba830f46 Wed Jun 14 06:05:05 2017 +0000 darwin/amd64

$ ~/gotip/bin/go test -v -run GRPC github.com/fabiolb/fabio/proxy
=== RUN   TestGRPC
2017/06/14 13:08:51 gRPC echo server listening on  https://127.0.0.1:60697/echosvc.EchoSvc
2017/06/14 13:08:51 http/2.0 server listening on  https://127.0.0.1:60698
=== RUN   TestGRPC/https_direct
=== RUN   TestGRPC/https_via_proxy
=== RUN   TestGRPC/gRPC_direct
=== RUN   TestGRPC/gRPC_via_proxy
--- PASS: TestGRPC (0.01s)
    --- PASS: TestGRPC/https_direct (0.00s)
    --- PASS: TestGRPC/https_via_proxy (0.00s)
    --- PASS: TestGRPC/gRPC_direct (0.00s)
    --- PASS: TestGRPC/gRPC_via_proxy (0.00s)
PASS
2017/06/14 13:08:51 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:60697->127.0.0.1:60704: use of closed network connection
ok      github.com/fabiolb/fabio/proxy  0.025s

There is still the stray log line about the closed network connection. I need to see where this comes from. What was also unexpected is when you call w.WriteHeader(0) you get a missing status pseudo header response. This happens when the HTTPProxy has no NoRouteStatus code configured.

I would appreciate if you could review the test setup since it took my some time to cobble it together. You most likely have some more experience with this.

wv-myoung commented 7 years ago

@magiconair I was able to confirm usage of http2.ConfigureTransport along with latest go from master to address the trailers functional with my gRPC harness as well.

Running your above test from the issue-244-grpc branch I am not seeing the closed network log line mentioned

~ # go version
go version master-nightly-17ba830f4663816c3270860fad96373a833a3b26 linux/amd64
~ # go test -v -run GRPC github.com/fabiolb/fabio/proxy
=== RUN   TestGRPC
2017/06/14 20:36:37 gRPC echo server listening on  https://127.0.0.1:40617/echosvc.EchoSvc
2017/06/14 20:36:37 http/2.0 server listening on  https://127.0.0.1:40451
=== RUN   TestGRPC/https_direct
=== RUN   TestGRPC/https_via_proxy
=== RUN   TestGRPC/gRPC_direct
=== RUN   TestGRPC/gRPC_via_proxy
--- PASS: TestGRPC (0.02s)
    --- PASS: TestGRPC/https_direct (0.00s)
    --- PASS: TestGRPC/https_via_proxy (0.00s)
    --- PASS: TestGRPC/gRPC_direct (0.00s)
    --- PASS: TestGRPC/gRPC_via_proxy (0.00s)
PASS
ok      github.com/fabiolb/fabio/proxy  0.023s

The issue does remain that it only works over https both grpcClient to fabio and fabio to grpcServer and so grpc services need to be tagged with the proto=https option. I would consider that a separate issue for later review as this does give a working path.

magiconair commented 7 years ago

@wv-myoung I'm not sure I understand the last statement. The upstream server needs proto=https since go does not support HTTP/2.0 without https? Why would the client need a tag? What am I missing?

wv-myoung commented 7 years ago

@magiconair I was stating that both the fabio proxy listener and upstream need to be configured for https for it to work currently. Even though if you go direct from grpc client to server it is able to work without https.

For the grpc service registering itself with consul to be served via fabio it needs to include the proto=https option.

As well, the listener that the grpc client connects to fabio on will need an appropriate certificate store configured.

Both grpc client and server also have settings for client certificate, ignoring validation issues and on the server side certificate and key values so those need to be done appropriately as well.

Essentially, all 4 edges (client, fabio listener, fabio upstream request and server) all need proper https configuration in place for it to currently work from my testing.

Ideally, we would be able to also use unsecured transports on either or both legs of the communication based upon use case.

magiconair commented 7 years ago

Does gRPC work with HTTP 1.1? Go does not support HTTP/2 without TLS. The tlsskipverify option disables cert verification in fabio.

wv-myoung commented 7 years ago

gRPC is HTTP/2 only and supports use of h2c. I've tested it with go based grpc client and server with http2debug=2 and can see that it is over HTTP/2 as shown below (note the scheme is http).

I believe the gRPC libraries may not be using the net/http2 libraries or using them in a lower level fashion as I have seen framer handling within their code.

This is mainly just a note that at this time that using fabio in the middle will require that it is https traffic. Perhaps later if there is better h2c support implemented to the core libraries then that will change.

2017/06/16 16:16:46 http2: Framer 0xc4201161c0: wrote SETTINGS len=0
2017/06/16 16:16:46 http2: Framer 0xc4201161c0: wrote WINDOW_UPDATE len=4 (conn) incr=983025
2017/06/16 16:16:46 http2: Framer 0xc4201161c0: read SETTINGS len=0
2017/06/16 16:16:46 http2: Framer 0xc4201161c0: read WINDOW_UPDATE len=4 (conn) incr=983025
2017/06/16 16:16:46 http2: Framer 0xc4201161c0: read HEADERS flags=END_HEADERS stream=1 len=73
2017/06/16 16:16:46 http2: decoded hpack field header field ":method" = "POST"
2017/06/16 16:16:46 http2: decoded hpack field header field ":scheme" = "http"
2017/06/16 16:16:46 http2: decoded hpack field header field ":path" = "/myWorld.World/SayHello"
2017/06/16 16:16:46 http2: decoded hpack field header field ":authority" = "localhost:5100"
2017/06/16 16:16:46 http2: decoded hpack field header field "content-type" = "application/grpc"
2017/06/16 16:16:46 http2: decoded hpack field header field "user-agent" = "grpc-go/1.4.0-dev"
2017/06/16 16:16:46 http2: decoded hpack field header field "te" = "trailers"
2017/06/16 16:16:46 http2: Framer 0xc4201161c0: read DATA flags=END_STREAM stream=1 len=12 data="\x00\x00\x00\x00\a\n\x05Test0"
magiconair commented 7 years ago

Well, if we can make it work with all transports then we should. I'll have a look whether I can make this work with http. Help is appreciated.

DennisPersson commented 6 years ago

Any updates on when / if the issue-244-grpc branch is expected to be merged to master? We have confirmed that this branch seems to work fine proxying our gRPC service (using proto=https and tlsskipverify=true).

But we cannot get it to work with the latest release, even when building fabio with go1.10rc1, where the fix mentioned above (golang/go@1611839) is supposed to be included.

Thanks!

magiconair commented 6 years ago

@DennisPersson Oh wow, I forgot about this one. I will have another look but if you want to help then this would be more than welcome.

DennisPersson commented 6 years ago

@magiconair Thanks for the quick response. Sorry, I don't think I'm familiar enough with Go to help with the coding, would be happy to help with another round of testing before you release if needed though.

magiconair commented 6 years ago

@DennisPersson I've rebased the branch, fixed go vet issues and added the missing dependencies. I need to get up to speed again on the state.

magiconair commented 6 years ago

@DennisPersson I think it would help if you test the branch and let me know what works and what doesn't. The more specific the better.

DennisPersson commented 6 years ago

I did some more testing on the rebased branch; regular service calls seems to work fine still. I noticed this warning message in our Java service, but calls and responses go through (this was present before the rebase as well):

2018-02-07 14:17:25,277 [grpc-default-worker-ELG-4-2] WARN io.grpc.netty.NettyServerHandler - Expected header TE: trailers, but null is received. This means some intermediate proxy may not support trailers

I suspected the above was due to the Go bug mentioned above, so I tried to rebuild with go1.10.rc1, but still got the same message.

One thing that didn't seem to work is the ProtoReflectionService that is included in gRPC-java (https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md). That didn't work before the rebase either though.

I think the difference between that service and our own service might be that the ProtoReflectionService streams back its responses, not sure if that is something that is even possible to support. I've attached the log file from the ProtoReflectionService call in case it is helpful.

reflectionServiceCall.log

We don't use any streaming in our own services, so for us that use case is not important.

magiconair commented 5 years ago

I've somewhat lost track of this. Could someone interested in this check whether #575 solves this use-case?

sunliusi commented 4 years ago

grpc client will disconnected on server stream.