tracing: allow specifying parent traceid/spanid when starting a trace

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

29.84k stars 3.77k forks source link

tracing: allow specifying parent traceid/spanid when starting a trace #19403

Open RaduBerinde opened 6 years ago

RaduBerinde commented 6 years ago

From https://forum.cockroachlabs.com/t/error-communicating-with-tls-secured-zipkin-collector/1029/3?u=radu

it’d also be useful to explore some way to set the session trace context explicitly. I’m thinking of some parameters when enabling tracing such as SET TRACE = 'TraceID:bd7a977555f6b982; SpanID:ebf33e1a81dc6f71';, and have Cockroach use those identifiers for its parent span. That would allow an application to pass its tracing context into Cockroach, and show Cockroach’s internal tracing in the context of the rest of the trace across services.

Jira issue: CRDB-5974

Epic: CRDB-11665

RaduBerinde commented 6 years ago

CC @andreimatei

andreimatei commented 6 years ago

I agree with the spirit. It'd be similarly cool to have some sort of support for passing a trace id in a sql query (as a comment).

dianasaur323 commented 6 years ago

@asubiotto a user requested this - would it be hard to wrap into the work you are already doing with trace?

andreimatei commented 6 years ago

I think the thing to do here is to add support for parsing trace ids out of SQL comments and then working with one of the Go drivers that supports the new-ish Go sql methods that take a context for generating the comment. On the server side we'd need to become more flexible in where we allow spans to be generated. I think the current code makes some assumptions about having a span-per-txn.

And then extreme care should be taken around the context that the TxnCoordSender captures for a txn. I think you currently can't have another cancellable ctx derived from the txn one, or something. This is major pain that #23055 is trying to address, but that work is stalled for a bit.

andreimatei commented 6 years ago

Oh and we should see what other databases/tracing systems are doing for propagating trace info to the database. I think other databases support such things too.

asubiotto commented 6 years ago

@dianasaur323, I'm not really doing any work with tracing other than outputting stats collection to the trace.

d4l3k commented 6 years ago

Just putting my plug in for this. It definitely would have be a cool feature to have since I have opentracing enabled on both CockroachDB and my application. I'm not actually sure how useful it would be since I already time the requests and that can identify any hot spots. That being said, query planning info (indexes etc) tagged to the trace would be a nice feature.

dylanisagit commented 5 years ago

Would like to add a +1 to this idea, it would be incredibly powerful. Right now, we're tracing all of our services but there's no way to have a span go all the way through a service and into CDB.

As a related item, it would be so helpful to be able to set trace levels. Right now, it seems to be binary - either CDB is tracing or it's not. This is hammering our tracing infrastructure, creating an order of magnitude more spans that all of our other services combined.

RaduBerinde commented 5 years ago

Thank you @Dylan-OMahony-Bose, this is good feedback! What kind of control would you like to see w.r.t what we trace and what we don't? Perhaps a mode where there is a single span per SQL query?

dylanisagit commented 5 years ago

Absolutely, that would be amazing - a verbose mode (as it is today) and a "production" mode (as single span per query, as you suggest), would be enough control. Think of it akin to DEBUG vs. FATAL log levels.

akshayjshah commented 5 years ago

This would be incredibly helpful - tracing browser requests all the way into CDB would be an immensely powerful debugging tool.

The Jaeger collector supports adaptive sampling, but there's some open work to get it completely finished. At the least, probabilistic tracing in "production" mode would be a nice way to keep some visibility but avoid crushing the tracing backend.

andreimatei commented 5 years ago

probabilistic tracing in "production" mode would be a nice way to keep some visibility but avoid crushing the tracing backend.

We're actually also working on that - our own trace collector that's somewhat "adaptive" in the sense that it will try to give you a certain number of samples for each component.

alienth commented 4 years ago

Would like to +1 this - we add our tracing span IDs to all of our SQL queries via baseplate. Would be incredibly handy if a database engine could capture this for tracing.

andreimatei commented 4 years ago

The time for a new round of investment in our tracing capabilities is coming, as in the next release (20.1) we hope to be able to display some query traces in our UI. I'll keep this request in mind. Thanks for pointing me to baseplate.

ajwerner commented 4 years ago

This would be pretty great.

I'm not sure how I feel about using the comment as in baseplate. I think I like it generally but we'd need to complicate the parsing process as now we'll need to examine comments. Perhaps we could put it into the parsed annotations? Another alternative would be to add it as a transaction setting using SET TRANSACTION. That would have the downside of not working for implicit transactions.

ajwerner commented 4 years ago

I looked a bit through how comments get parsed in our grammar (or rather don't) and I no longer like the comment idea. Perhaps we can add some syntax to provide settings for the following statement. A SET STATEMENT ...; that takes the same things as SET TRANSACTION and then the following statement inherits the settings.

RaduBerinde commented 4 years ago

There is another usecase for comments: query hints. It may be useful to somehow annotate AST nodes with the comments around them.