(apologies – I don't know why the diff is so large when I just added more correct answers to just eight questions)
This mostly fixes two categories of problems: one where doing a LEFT JOIN on an arbitrarily selected initial table was causing additional rows to appear in the data.
In the example below, both are the correct queries to answer the question How does the ratio of authors to organizations differ by continent?, yet only the first was previously marked correct
Query:
SELECT organization.continent, COUNT(DISTINCT author.aid)::float / NULLIF(COUNT(DISTINCT organization.oid), 0) AS ratio
FROM organization
LEFT JOIN author ON author.oid = organization.oid
GROUP BY organization.continent ORDER BY ratio DESC NULLS LAST;
Data:
continent | ratio
---------------+-------
North America | 1.5
Europe | 0.5
Asia | 0
Query:
SELECT organization.continent, COUNT(DISTINCT author.aid)::float / NULLIF(COUNT(DISTINCT organization.oid), 0) AS ratio
FROM author
LEFT JOIN organization ON author.oid = organization.oid
GROUP BY organization.continent ORDER BY ratio DESC NULLS LAST;
Data:
continent | author_to_organization_ratio
---------------+------------------------------
North America | 1.5
Europe | 0.5
The second big category is adding multiple accepted answers if a title columns is also being added. Like so:
SQL:
SELECT paperkeyphrase.paperid, COUNT(paperkeyphrase.keyphraseid) AS keyphrase_count
FROM paperkeyphrase
GROUP BY paperkeyphrase.paperid
ORDER BY keyphrase_count DESC NULLS LAST;
Answer:
paperid | keyphrase_count
---------+-----------------
3 | 2
5 | 2
4 | 1
2 | 1
1 | 1
SQL:
SELECT p.title, COUNT(pk.keyphraseid) AS num_keyphrases
FROM paper p
JOIN paperkeyphrase pk ON p.paperid = pk.paperid
GROUP BY p.title
ORDER BY num_keyphrases DESC NULLS LAST;
Answer
title | num_keyphrases
----------------------------------------------+----------------
Machine Learning in Tackling Climate Change | 2
Social Media and Mental Health | 2
COVID-19 Impact on Society | 1
A Study on Machine Learning Algorithms | 1
The Effects of Climate Change on Agriculture | 1
Renamed correct_instructions to instructions, and the old instructions to full_instructions. This ensures that we use the right instructions set in our queries for all evals
Clarified some questions and queries in the instruct-basic and instruct-advanced question sets
(apologies – I don't know why the diff is so large when I just added more correct answers to just eight questions)
This mostly fixes two categories of problems: one where doing a
LEFT JOIN
on an arbitrarily selected initial table was causing additional rows to appear in the data.In the example below, both are the correct queries to answer the question
How does the ratio of authors to organizations differ by continent?
, yet only the first was previously marked correctThe second big category is adding multiple accepted answers if a title columns is also being added. Like so: