AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.1k stars 178 forks source link

Bugs found in evaluation that may lead to failure #93

Closed coding-fish closed 4 months ago

coding-fish commented 8 months ago

When conduct evaluation using suite at llm/src/evaluation, code raise exception for reasons as following:

1 encode not indicated when loading json file:

with open(dir, 'r') as j:
    json.load(j)

default encode 'gbk' cannot load in our output, but we can only use 'utf8' to keep align with given database and description file. Thus it would be better to explicitly indicate encode like this:

with open(dir, 'r', encoding='utf8') as j:
    json.load(j)

this problem can be found at:

# `llm/src/evaluation.py`
line 9, line 55, line 65

# `llm/src/evaluation_ves.py`
line 80, line 90, line 132

2 type error after loading json

output file is organized as following format:

[
  [
    "What is the highest eligible free rate for K-12 students in the schools in Alameda County?",
    "SELECT MAX(`Free Meal Count (K-12)` / `Enrollment (K-12)`) as HighestEligibleFreeRate  FROM frpm  WHERE `County Name` = 'Alameda'  AND `Free Meal Count (K-12)` IS NOT NULL  AND `Enrollment (K-12)` IS NOT NULL  AND `Enrollment (K-12)` > 0\t----- bird -----\tcalifornia_schools"
  ],
  [
    "Please list the lowest three eligible free rates for students aged 5-17 in continuation schools.",
    "SELECT T1.`Free Meal Count (Ages 5-17)` / T1.`Enrollment (Ages 5-17)` AS Eligible_Free_Rate  FROM frpm AS T1  INNER JOIN schools AS T2  ON T1.`CDSCode` = T2.`CDSCode`  WHERE T2.`EdOpsName` = 'Continuation School' AND T1.`Free Meal Count (Ages 5-17)` IS NOT NULL AND T1.`Enrollment (Ages 5-17)` IS NOT NULL ORDER BY Eligible_Free_Rate ASC  LIMIT 3\t----- bird -----\tcalifornia_schools"
  ],
  [
    "Please list the zip code of all the charter schools in Fresno County Office of Education.",
    "SELECT T2.`Zip`   FROM frpm AS T1   INNER JOIN schools AS T2   ON T1.`CDSCode` = T2.`CDSCode`   WHERE T1.`Charter School (Y/N)` = 1   AND T1.`District Name` = 'Fresno County Office of Education'\t----- bird -----\tcalifornia_schools"
  ],
...
]

after sql_data = json.load(fp), we found data type of 'sql_data' is 'list' but not 'dict', so 'sql_data.items()' may be not appropriate

    if mode == 'gpt':
        sql_data = json.load(open(sql_path + 'predict_' + data_mode + '.json', 'r', encoding='utf8'))
        for idx, sql_str in sql_data.items():
            ...

we can fix bug by just delete '.items()' easily

this problem can be found at:

# `llm/src/evaluation.py`
line 56

# `llm/src/evaluation_ves.py`
line 81

3 evlauation params required

when calculating VES, we found param --diff_json_path is required, if not, it may cause error in evaluation_ves. And we cannot see any execution result after a long running time, bacause result can only be printed after compute_ves_by_diff, which must need param --diff_json_path . so i recommend set it required at first, to remind user in the begining. this problem can be found at:

# `llm/src/evaluation_ves.py`
line 163
accpatrick commented 5 months ago

@coding-fish Thank you for your interets in our work and great suggestions! We will consider this carefully and refine our code soon. For the second suggestion, I remembered I already convert the result list into json in line 227 of llm/src/gpt_request.py. And I re-run the code which seems it's normal. We will keep refining our code and sorry for inconvenience brought here.