envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.85k stars 4.78k forks source link

Jaeger sandbox is broken #6281

Closed venilnoronha closed 5 years ago

venilnoronha commented 5 years ago

Title: Jaeger sandbox is broken

Description:

Jaeger sandbox should run as explained in https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/jaeger_tracing.html; however, jaeger-tracing_front-envoy_1 exits with an error.

Repro steps:

$ cd examples/jaeger-tracing
$ docker-compose up --build -d
$ docker-compose ps
            Name                          Command               State                                                        Ports
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
jaeger-tracing_front-envoy_1   /usr/bin/dumb-init -- /bin ...   Exit 1
jaeger-tracing_jaeger_1        /go/bin/all-in-one-linux - ...   Up       14250/tcp, 14268/tcp, 0.0.0.0:16686->16686/tcp, 5775/udp, 5778/tcp, 6831/udp, 6832/udp, 0.0.0.0:9411->9411/tcp
jaeger-tracing_service1_1      /bin/sh -c /usr/local/bin/ ...   Up       10000/tcp, 80/tcp
jaeger-tracing_service2_1      /bin/sh -c /usr/local/bin/ ...   Up       10000/tcp, 80/tcp

Logs:

$ docker logs jaeger-tracing_front-envoy_1
[2019-03-13 18:09:57.257][000008][info][main] [source/server/server.cc:202] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
[2019-03-13 18:09:57.257][000008][info][main] [source/server/server.cc:204] statically linked extensions:
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:206]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:209]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:212]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:215]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:217]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:219]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:222]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
[2019-03-13 18:09:57.258][000008][info][main] [source/server/server.cc:225]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
[2019-03-13 18:09:57.296][000008][critical][main] [source/server/server.cc:80] error initializing configuration '/etc/front-envoy.yaml': Unable to parse JSON as proto (INVALID_ARGUMENT:(static_resources.listeners[0].filter_chains[0].filters[0]) typed_config: Cannot find field.): {"tracing":{"http":{"typed_config":{"shared_span_context":false,"collector_endpoint":"/api/v1/spans","collector_cluster":"jaeger","@type":"type.googleapis.com/envoy.config.trace.v2.ZipkinConfig"},"name":"envoy.zipkin"}},"admin":{"address":{"socket_address":{"port_value":8001,"address":"0.0.0.0"}},"access_log_path":"/dev/null"},"static_resources":{"clusters":[{"load_assignment":{"endpoints":[{"lb_endpoints":[{"endpoint":{"address":{"socket_address":{"port_value":80,"address":"service1"}}}}]}],"cluster_name":"service1"},"http2_protocol_options":{},"name":"service1","connect_timeout":"0.250s","type":"strict_dns","lb_policy":"round_robin"},{"load_assignment":{"endpoints":[{"lb_endpoints":[{"endpoint":{"address":{"socket_address":{"port_value":80,"address":"jaeger"}}}}]}],"cluster_name":"jaeger"},"name":"jaeger","connect_timeout":"1s","type":"strict_dns","lb_policy":"round_robin"}],"listeners":[{"filter_chains":[{"filters":[{"typed_config":{"route_config":{"virtual_hosts":[{"routes":[{"decorator":{"operation":"checkAvailability"},"route":{"cluster":"service1"},"match":{"prefix":"/"}}],"domains":["*"],"name":"backend"}],"name":"local_route"},"@type":"type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager","codec_type":"auto","stat_prefix":"ingress_http","http_filters":[{"typed_config":{},"name":"envoy.router"}],"tracing":{"operation_name":"egress"}},"name":"envoy.http_connection_manager"}]}],"address":{"socket_address":{"port_value":80,"address":"0.0.0.0"}}}]}}
[2019-03-13 18:09:57.296][000008][info][main] [source/server/server.cc:500] exiting
venilnoronha commented 5 years ago

I'm hitting this issue consistently with Zipkin and Jaeger Native as well.

moderation commented 5 years ago

When I tested https://github.com/envoyproxy/envoy/pull/6025 I validated all of the new configs for OT, Zipkin and Lightstep. Either way I can probably take a look at this tomorrow if you can wait.

mattklein123 commented 5 years ago

@rnburn @objectiser

objectiser commented 5 years ago

Looks like the updated yaml files from #6025 are being used, but possibly with an older version of envoy?

moderation commented 5 years ago

After doing some testing the only change required to make this work is to ensure that the docker compose commands that build and launch the containers are using the latest Envoy images. Users without existing old images in their docker images list won't hit this error.

@objectiser is right in that if you have old images they won't handle the new configs.

Sequence of commands is:

$ pwd
envoy/examples/jaeger-tracing
$ docker pull envoyproxy/envoy
$ docker pull envoyproxy/envoy-alpine
$ docker-compose up --build -d
$ docker-compose ps

@venilnoronha please validate and I'll create a docs PR to add the two docker pull commands.

For those wondering, it doesn't appear to be a way to change the docker-compose.yaml file to pull images as per the actual docker-compose pull command line - https://docs.docker.com/compose/reference/build/. There are all sorts of crazy suggestions out there like deleting all images before doing a docker-compose up. docker-compose pull before up might work but one only has so much time to waste on things related to Docker.

venilnoronha commented 5 years ago

Pulling the latest copy of the Docker images worked. Thanks!

moderation commented 5 years ago

Jaeger Native is failing for me as the download and install of libjaegertracing_plugin.linux_amd64.so is busted. Will look into this. I recall it was a pain when setting up on Linux.

venilnoronha commented 5 years ago

Not too far and I hit another issue. I think I'm clearly missing something.

The docs don't mention anything about docker-machine except for the curl command. I assumed that I had to create a machine so I went ahead an executed the following command.

$ docker-machine create --driver virtualbox default

Then, when I execute the curl command, I observe the following.

$ curl -v $(docker-machine ip):8000/trace/1
*   Trying 192.168.99.100...
* TCP_NODELAY set
* Connection failed
* connect to 192.168.99.100 port 8000 failed: Connection refused
* Failed to connect to 192.168.99.100 port 8000: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 192.168.99.100 port 8000: Connection refused

What am I missing?

moderation commented 5 years ago

@venilnoronha I've done a bunch of testing on this and I think we should drop any reference to docker-machine. It is not required to run any of the sandboxes and adds to confusion and difficulty of getting the sandboxes running.

So you can replace $(docker-machine ip) and $(docker-machine ip default) with localhost and everything should work.

Lastly I found bugs like the front envoy in the jaeger-native sandbox sending traces to the wrong port and therefore not showing up in Jaeger.

I need to modify the CORS sandbox to not use docker machines.

tl;dr is a biggish documentation PR is coming. Please test localhost as per above.

venilnoronha commented 5 years ago

Yes, localhost instead of $(docker-machine ip) actually worked. Awesome, thanks!

venilnoronha commented 5 years ago

I've tested both Jaeger Native and Zipkin tracing sandboxes.