defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
540 stars 56 forks source link

Fix sql for author to org ratio #193

Closed wendy-aw closed 3 months ago

wendy-aw commented 3 months ago

Fixed a SQL in questions_gen that had differing correct answers due to joins. Question: How does the ratio of authors to organizations differ by continent? Previous golden SQLs and their output

SELECT
  o.continent,
  CAST(COUNT(DISTINCT a.aid) AS REAL) / NULLIF(COUNT(DISTINCT o.oid), 0) AS author_to_organization_ratio
FROM
  author AS a
  JOIN organization AS o ON a.oid = o.oid
GROUP BY
  o.continent
ORDER BY
  author_to_organization_ratio DESC;

   continent   | author_to_organization_ratio 
---------------+------------------------------
 North America |                          1.5
 Europe        |                            1
SELECT
  organization.continent,
  CAST(COUNT(DISTINCT author.aid) AS REAL) / NULLIF(COUNT(DISTINCT organization.oid), 0) AS ratio
FROM
  organization
  LEFT JOIN author ON author.oid = organization.oid
GROUP BY
  organization.continent
ORDER BY
  ratio DESC;

   continent   | ratio 
---------------+-------
 North America |   1.5
 Europe        |   0.5
 Asia          |     0
SELECT
  organization.continent,
  CAST(COUNT(DISTINCT author.aid) AS REAL) / NULLIF(COUNT(DISTINCT organization.oid), 0) AS ratio
FROM
  author
  LEFT JOIN organization ON author.oid = organization.oid
GROUP BY
  organization.continent
ORDER BY
  ratio DESC;

   continent   | ratio 
---------------+-------
               |      
 North America |   1.5
 Europe        |     1

Only the second is correct. Also added another possible correct answer without the 0 row.

WITH
  author_counts AS (
    SELECT
      o.continent,
      COUNT(DISTINCT a.aid) AS author_count
    FROM
      author AS a
      JOIN organization AS o ON a.oid = o.oid
    GROUP BY
      o.continent
  ),
  organization_counts AS (
    SELECT
      o.continent,
      COUNT(DISTINCT o.oid) AS organization_count
    FROM
      organization AS o
    GROUP BY
      o.continent
  )
SELECT
  ac.continent,
  CAST(ac.author_count AS FLOAT) / NULLIF(oc.organization_count, 0) AS author_to_organization_ratio
FROM
  author_counts AS ac
  JOIN organization_counts AS oc ON ac.continent = oc.continent
ORDER BY
  author_to_organization_ratio DESC;

  continent   | author_to_organization_ratio 
---------------+------------------------------
 North America |                          1.5
 Europe        |                          0.5
wendy-aw commented 3 months ago

Yikes sorry. Lemme rebase

wendy-aw commented 3 months ago

Aah, thank you for identifying and fixing this! +1 for the CTE approach -- so much more readable

Our own model came up with that :D

rishsriv commented 3 months ago

Oh wow, so cool!