census-instrumentation / opencensus-python

A stats collection and distributed tracing framework
Apache License 2.0
668 stars 249 forks source link

Jaeger exporter not working with collector #878

Open 23doors opened 4 years ago

23doors commented 4 years ago

Describe your environment.

opencensus==0.7.7
opencensus-ext-jaeger==0.7.1
opencensus-ext-zipkin==0.2.2

Running jaeger all in one with zipkin enabled:

  jaeger:
    image: jaegertracing/all-in-one:1.17
    environment:
      - COLLECTOR_ZIPKIN_HTTP_PORT=9411
    ports:
      - "5775"
      - "6831"
      - "6832"
      - "5778"
      - "16686:16686"
      - "14268:14268"
      - "14250"
      - "9411:9411"

Steps to reproduce.

Using tracer with collector endpoint: JaegerExporter(service_name='my-service', host_name='localhost', port=14268) causes: EOFError (caught here: https://github.com/census-instrumentation/opencensus-python/blob/master/contrib/opencensus-ext-jaeger/opencensus/ext/jaeger/trace_exporter/__init__.py#L382 - which is not printed due to lack of str() and message interpretation).

It seems that for whatever reason, I am getting: 400 Bad Request from jaeger.

ZipkinExporter(service_name='my-service', host_name='localhost', port=9411) works just fine with same setup.

Rmaan commented 3 years ago

I can confirm, Jaeger support is basically broken. After 3 days of debugging I found the problem.

Inside opencensus.ext.jaeger.trace_exporter.Collector.emit code is calling Thrift service self.client.submitBatches([batch]) but that's totally wrong. Instead, it should encode batches and send them (it shouldn't call any Thrift RPC).

Source https://www.jaegertracing.io/docs/1.13/apis/#thrift-over-http-stable

The Batch struct needs to be encoded using Thrift’s binary encoding

I think implementation was broken from day 1. I assume every user is using agent mode instead of collector. It also has another issue for us that it assumes API schema is HTTP but we're using HTTPS.

Here is a quick fix I did, maybe I can open a PR in my free time: In opencensus.ext.jaeger.trace_exporter.Collector.emit change

self.client.submitBatches([batch])

to

batch.write(self.client._oprot)
self.client._oprot.trans.flush()