ethercrow / opentelemetry-haskell

The OpenTelemetry Haskell Client https://opentelemetry.io
Other
65 stars 6 forks source link

Improve the intermediate data format #8

Closed ethercrow closed 4 years ago

ethercrow commented 4 years ago

Currently data is written into eventlog using the String-based functions in a dumb format where the codec is basically words and unwords: https://github.com/ethercrow/opentelemetry-haskell/blob/master/opentelemetry/src/OpenTelemetry/Eventlog.hs#L17

It should be replaced with something more typed and written using the ByteString-based trace API.

Please note that the opentelemetry library should not depend on anything that is not shipped with GHC.

yaitskov commented 4 years ago

I benchmarked various ways of writing to eventslog. Except using unsafe ByteString I would try to use batch data with 1 kb buffer.

https://github.com/yaitskov/eventslog-benchmark

yaitskov commented 4 years ago

@ethercrow why decimal format is used in eventslogs?

e.g. in beginSpan

  liftIO $ traceEventIO (printf "ot2 begin span %d %s" u64 operation)

It is faster to convert a number to Hex or Base64 formats plus they take less space. Is it possible to change that?

Another note. The var is 8 bytes byte random number function is 4 bytes. So why not to shrink var to 4 bytes?

mpickering commented 4 years ago

You would surely be better benchmarking with -l-au so you don't get all the normal RTS events in the eventlog.

ethercrow commented 4 years ago

@ethercrow why decimal format is used in eventslogs?

Current String-based format is not performance oriented so it was an arbitrary choice. For a future binary format all those span and trace ids should be encoded as 8 bytes.

The var is 8 bytes byte random number function is 4 bytes.

I don't get it, what is 4 bytes here?

yaitskov commented 4 years ago

@ethercrow ,

I don't get it, what is 4 bytes here?

Random number generator returns Int and it is casted to Word64

(hashUnique <$> liftIO newUnique) :: m Int

Current String-based format is not performance oriented so it was an arbitrary choice. For a future binary format all those span and trace ids should be encoded as 8 bytes.

Is it possible to write binary data to eventlog? I saw, that there are 2 different types UserMessage and UserBinaryMessage, but unsafeTraceIO comment warns that ByteString must be 0 terminated.

yaitskov commented 4 years ago

@mpickering, thanks about -l-au. I didn't know.

ethercrow commented 4 years ago

Int is 8 bytes on most systems these days.

> maxBound @Int32
2147483647
> maxBound @Int 
9223372036854775807
yaitskov commented 4 years ago

19