GoogleCloudPlatform / bigquery-antipattern-recognition

Utility to identify and rewrite common anti patterns in BigQuery SQL syntax
Apache License 2.0
76 stars 26 forks source link

Adding SelfJoin anti pattern #36

Open paraschavda2411 opened 9 months ago

franklinWhaite commented 9 months ago
  1. Please resolve the conflicts in main
  2. Please add the an example of a self join the the sample folder and the sample csv
  3. Please add an example of the anti pattern in the readme as shown here
franklinWhaite commented 9 months ago

Please add the following test case:

SELECT 
  date(t1.trip_start_timestamp) dt,
  COUNT(DISTINCT t1.unique_key) ct,
  SUM(t2.ct) ct_month
FROM 
  `bigquery-public-data.chicago_taxi_trips.taxi_trips` t1
LEFT JOIN
  (SELECT 
    FORMAT_DATE("%Y-%m", trip_start_timestamp) month, 
    COUNT(DISTINCT unique_key) ct
  FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
  GROUP BY month
  ) t2 
    ON month = FORMAT_DATE("%Y-%m", t1.trip_start_timestamp)
WHERE
  FORMAT_DATE('%Y-%m', t1.trip_start_timestamp) = '2023-02'
  AND t2.month = '2023-02'
GROUP BY
  dt
ORDER BY
  dt desc
franklinWhaite commented 9 months ago

Bug The following query is incorrectly labeled as a case of the self-join anti-pattern

`

SELECT
    t1.col1
FROM 
    table1 t1 
JOIN 
    table2 t2 on t1.col1 = t2.col2
UNION ALL
SELECT
    t2.col1
FROM 
    table2 t2 
JOIN 
    table3 t3 on t1.col1 = t2.col2
paraschavda2411 commented 9 months ago

Resolved the bug Added the test case suggested Resolved the conflicts in main Added sample csv, queries and outputs Updated the read-me content

Please review and let me know if anything