defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
485 stars 52 forks source link

Provide multiple correct answers for some questions #119

Closed rishsriv closed 2 months ago

rishsriv commented 2 months ago

(apologies – I don't know why the diff is so large when I just added more correct answers to just eight questions)

This mostly fixes two categories of problems: one where doing a LEFT JOIN on an arbitrarily selected initial table was causing additional rows to appear in the data.

In the example below, both are the correct queries to answer the question How does the ratio of authors to organizations differ by continent?, yet only the first was previously marked correct

Query:
SELECT organization.continent, COUNT(DISTINCT author.aid)::float / NULLIF(COUNT(DISTINCT organization.oid), 0) AS ratio
FROM organization
LEFT JOIN author ON author.oid = organization.oid
GROUP BY organization.continent ORDER BY ratio DESC NULLS LAST;

Data:
   continent   | ratio 
---------------+-------
 North America |   1.5
 Europe        |   0.5
 Asia          |     0

Query:
SELECT organization.continent, COUNT(DISTINCT author.aid)::float / NULLIF(COUNT(DISTINCT organization.oid), 0) AS ratio
FROM author
LEFT JOIN organization ON author.oid = organization.oid
GROUP BY organization.continent ORDER BY ratio DESC NULLS LAST;

Data:
   continent   | author_to_organization_ratio 
---------------+------------------------------
 North America |                          1.5
 Europe        |                          0.5

The second big category is adding multiple accepted answers if a title columns is also being added. Like so:

SQL:
SELECT paperkeyphrase.paperid, COUNT(paperkeyphrase.keyphraseid) AS keyphrase_count
FROM paperkeyphrase
GROUP BY paperkeyphrase.paperid
ORDER BY keyphrase_count DESC NULLS LAST;

Answer:
 paperid | keyphrase_count 
---------+-----------------
       3 |               2
       5 |               2
       4 |               1
       2 |               1
       1 |               1

SQL:
SELECT p.title, COUNT(pk.keyphraseid) AS num_keyphrases
FROM paper p
JOIN paperkeyphrase pk ON p.paperid = pk.paperid
GROUP BY p.title
ORDER BY num_keyphrases DESC NULLS LAST;

Answer
                    title                     | num_keyphrases 
----------------------------------------------+----------------
 Machine Learning in Tackling Climate Change  |              2
 Social Media and Mental Health               |              2
 COVID-19 Impact on Society                   |              1
 A Study on Machine Learning Algorithms       |              1
 The Effects of Climate Change on Agriculture |              1
rishsriv commented 2 months ago

Additionally:

  1. Renamed correct_instructions to instructions, and the old instructions to full_instructions. This ensures that we use the right instructions set in our queries for all evals
  2. Clarified some questions and queries in the instruct-basic and instruct-advanced question sets