Closed Magolor closed 12 months ago
Thank you for your valuable suggestion! We just updated the GPT-4 results in the leaderboard.
@Magolor Thanks for your interest in our work.
For the first issue, you can simply remove BOM and parse the CSV via latin1
.
For the second issue, we just provide meaningful descriptions for columns. home_player_<number>
is intuitive to understand.
For the third issue, thanks for pointing it out! Please download the dev data again, we update descriptions so that the original_column_name
can be matched with column names in the databases. Thanks
@Magolor GPT-4 results have been updated in https://bird-bench.github.io/
Currently, I'm only playing with BIRD's dev set. Just found the following problems occurred in the dev set in
database_description
:Encoding Issue with
.csv
files: These files are claimed to beutf-8
encoding withBOM
, but many contain non-utf-8
characters. For example, in theformula_1
database'squalifying
table, there's an invisible character in the first row's value_description:This causes the pandas csv reader to fail. The
.csv
files'BOM
start prevents using other encodings likelatin1
oriso-8859-1
. The current workaround is to delete these invisible characters in multiple files manually.Schema Mismatch in
.csv
and.sqlite
files: Some.csv
files indatabase_description
don't match the.sqlite
table schema. For example, in theeuropean_football_2
database, the table contains columns namedhome_player_<number>
, absent in the.csv
files. These files only containhome_player_X<number>
andhome_player_Y<number>
, which are also not well-described.Incorrect
.csv
File Names: Some.csv
files have incorrect names. For example, in thecard_games
database,ruling.csv
andSet_transactions.csv
should berulings.csv
andSet_translations.csv
, respectively.Additionally, it seems that the leaderboard is not updating ever since March. Now that GPT-4 is open, could the leaderboard be updated with GPT-4 standings to avoid us running the scripts separately?