DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.84k stars 1.2k forks source link

broken queries from obfuscator #7167

Open dhduvall opened 3 years ago

dhduvall commented 3 years ago

Describe what happened: Made a MySQL query from Django that contained one of these queries:

The traces had corresponding query fingerprints of

Note the commas missing after the ? making the queries syntactically incorrect.

Describe what you expected: The fingerprints should have been

Note the commas now correctly present after the ?.

Steps to reproduce the issue: Apply this diff to 53c30898b:

diff --git a/pkg/trace/obfuscate/sql_test.go b/pkg/trace/obfuscate/sql_test.go
index 1e4514e39..e5a77d1a2 100644
--- a/pkg/trace/obfuscate/sql_test.go
+++ b/pkg/trace/obfuscate/sql_test.go
@@ -672,6 +672,14 @@ ORDER BY [b].[Name]`,
                        `SELECT id, name FROM emp WHERE name LIKE {fn UCASE('Smith')}`,
                        `SELECT id, name FROM emp WHERE name LIKE ?`,
                },
+               {
+                       `SELECT CONCAT(v.address_1, ' ', v.address_2) FROM Venues v`,
+                       `SELECT CONCAT ( v.address_1, ?, v.address_2 ) FROM Venues v`,
+               },
+               {
+                       `SELECT IF (name IS NOT NULL, name, 'default')`,
+                       `SELECT IF ( name IS NOT ?, name, ? )`,
+               },
        }

        for _, c := range cases {

and run the tests (go test -mod=vendor -v ./pkg/trace/obfuscate), to see:

=== RUN   TestSQLQuantizer/#75
    sql_test.go:689:
            Error Trace:    sql_test.go:689
            Error:          Not equal:
                            expected: "SELECT CONCAT ( v.address_1, ?, v.address_2 ) FROM Venues v"
                            actual  : "SELECT CONCAT ( v.address_1, ? v.address_2 ) FROM Venues v"

                            Diff:
                            --- Expected
                            +++ Actual
                            @@ -1 +1 @@
                            -SELECT CONCAT ( v.address_1, ?, v.address_2 ) FROM Venues v
                            +SELECT CONCAT ( v.address_1, ? v.address_2 ) FROM Venues v
            Test:           TestSQLQuantizer/#75
=== RUN   TestSQLQuantizer/#76
    sql_test.go:689:
            Error Trace:    sql_test.go:689
            Error:          Not equal:
                            expected: "SELECT IF ( name IS NOT ?, name, ? )"
                            actual  : "SELECT IF ( name IS NOT ? name, ? )"

                            Diff:
                            --- Expected
                            +++ Actual
                            @@ -1 +1 @@
                            -SELECT IF ( name IS NOT ?, name, ? )
                            +SELECT IF ( name IS NOT ? name, ? )
            Test:           TestSQLQuantizer/#76
--- FAIL: TestSQLQuantizer (0.00s)
ThePianoDentist commented 2 years ago

Same issue for me with a postgresql query:

Version Agent 7.35.2 - Commit: edd49be - Serialization version: v5.0.16 - Go version: go1.17.6 on Ubuntu 22.04 LTS (Jammy Jellyfish)

coalesce((select (U0."limit" - U0."value") as "difference" from "table1" U0 inner join "table2" U1 on (U0."col1" = U1."id" and (U0."col2" = %s)) where U1."id" = ("table3"."id") order by "difference" asc limit 1), %s) as "col4"

ends up as

COALESCE ( ( SELECT ( U0. limit - U0. value ) FROM table1 U0 INNER JOIN table2 U1 ON ( U0. col1 = U1. id AND ( U0. col2 = ? ) ) WHERE U1. id = ( table3 . id ) ORDER BY difference ASC LIMIT ? ) )

It has lost:

Also further on in the SQL it has a couple of incorrect interpretations of parameters, where it thinks a table name is a parameter: e.g. INNER JOIN table1 ON ( table2 . col1 = ? . id )

I am using tableN/colN names because the code is proprietary. But the real names are very "normal" names, snake-case with no keywords.

I confirmed the python dd-trace library had the correct SQL string as it's resource.