dedis / onet

Overlay Network for distributed protocols
GNU Lesser General Public License v3.0
51 stars 28 forks source link

Adding tracing services to Onet #623

Closed ineiti closed 4 years ago

ineiti commented 4 years ago

Instead of simply logging message to the console, syslog, or graylog, it is much better to do actual tracing of your code. To avoid having to add tracing messages throughout the code, this PR proposes a new onet-logger that converts the current log-calls to tracing-magick.

Currently there are two tracers written:

This code is currently tested on c4dt's 2nd conode, and it seems to work nicely.

Closes #618

ineiti commented 4 years ago

TODO (please add others):

ineiti commented 4 years ago

As I understood it, tracing is quite different from logging. Logging show status/progress at each step, tracing try to measure a span of execution.

Yes, and that's the whole point of this PR: convert the current logging into a tracing, using the fact that golang's stack-traces are detailed enough to separate the different traces.

So whenever a log comes in, the Trace structure looks it up and checks whether that trace already exists. Then it inserts it correctly in the tree, simulating calls from methods to another. This is why it needs to keep track of context.Context.

The alternative is to actively rewrite onet and cothority to use a tracing framework...

gnarula commented 4 years ago

So whenever a log comes in, the Trace structure looks it up and checks whether that trace already exists. Then it inserts it correctly in the tree, simulating calls from methods to another. This is why it needs to keep track of context.Context.

I might be wrong here but IMHO logs are something which are isolated to a single node and traces "span" across multiple nodes. Case in point, the hierarchy in the example at https://www.jaegertracing.io/img/trace-detail-ss.png.

IIRC, tracing libraries usually keep the span info in a context.Context and propagate it for all network requests. I'm not sure if we're doing the same here because getTraceSpan is only looking at the method signature. Am I missing something?

In other words, how would a message being handled by tree node at level N know the span info about its parent at level N - 1?

ineiti commented 4 years ago

In other words, how would a message being handled by tree node at level N know the span info about its parent at level N - 1?

That's the limit of the solution proposed here. Theoretically one could push it further and use the protocol-ID to merge traces. But then I think it would be difficult to find a frontend that can visualize a BFT-BLSCosi signature. Or, even worse, a full trace starting from a ClientTranaction request to the final block...

So the scope of this PR is to have a better log-aggregation that goes somewhat towards tracing. The traceWrapper structure keeps track of the 'simulated' context.Context structure. What I'm currently struggling with are the go-routines, as they are difficult to track through the string returned by runtime.Stack.

ineiti commented 4 years ago

OK, without adding extra information in the code this will be very difficult. I tried to get an automatic detection of which go-routines go together, but this is not something that go supports. So you have to scrape it together using runtime.Stack and stuff like that, which makes it awful.

I think I'll add a log.TraceID anyway where you can set an ID, if you have one. Currently the only ID that is available is RoundID from the onet.Token, which is used in protocols. But it is not ideal - the best would be of course to replace all the log. messages with real tracing messages. But that looks like a lot of work :(

ineiti commented 4 years ago

log.TraceID is added and seems to work nicely. Now I can for example follow a blscosi signature through the different phases. It's not too pretty, but not too bad, neither.

It's a bit finnicky to set up with putting the log.TraceID in the right places.

ineiti commented 4 years ago

OK, this looks somewhat usable. I'm waiting for @tharvik and @cgrigis to play around with honeycomb.io, and if they have a not-too-difficult proposition to be included, I'll do it.

Else this looks nearly final to me (except @tharvik's code suggestions, of course).

ineiti commented 4 years ago

As soon as it passes the tests I think it's R4M. Except comments, of course.

ineiti commented 4 years ago

Heisenbug - need to fix it first.

ineiti commented 4 years ago

Found it - goroutines are of course not deterministic... Should be ready 4 merge now.