aws / aws-xray-sdk-go

AWS X-Ray SDK for the Go programming language.
Apache License 2.0
276 stars 118 forks source link

xray.SqlContext duplicates all non-trivial SQL in trace #429

Open fdegiuli opened 8 months ago

fdegiuli commented 8 months ago

The connections provided by the go MySQL driver will return ErrSkip to any SQL query with parameters unless the interpolateParams option is explicitly set to true--it is false by default. (See https://github.com/go-sql-driver/mysql/blob/master/connection.go#L368)

This SDK's implementation of SqlContext creates a subsegment for both the original aborted non-call (which returns ErrSkip and is never executed) and the following actual call to the database.

This causes the majority of SQL queries to show up twice in the trace; one with skip fast-path; continue as if unimplemented appended to the query, and then another with just the query text.

Ideally SqlContext would avoid creating a subsegment when it receives ErrSkip

See https://github.com/aws/aws-xray-sdk-go/blob/master/xray/sql_context.go#L238 if my explanation is confusing

fdegiuli commented 8 months ago

I'm happy to put up an MR if we think this is worth doing.

My thought was to introduce a segment.Abort() method, which simply removes the segment from its parent and discards it. The Capture function could then look for a sentinel error and abort the segment instead of closing it.

Let me know if that seems reasonable or if there's a better way.

wangzlei commented 8 months ago

Thanks for raising this case. Trace, as a monitoring tool, records the conditions that occur during program execution, much like logs and metrics. Therefore, similar to retries, Sql queries with ErrSkip should also be logged. Simply deleting ErrSkip subsegments is not a generic solution for every user. If presenting ErrSkip alongside successful SQL queries creates a perception of duplicates for customers, we should find a way to enhance the user experience in this area. At the very least, we need to ensure that ErrSkip subsegments contain clear indications, preventing customers from easily misunderstanding the situation.

Could you help provide the trace raw data and a trace graph screenshot with ErroSkip and successful queries? We will investigate whether it could be improved.