Bugs found in evaluation that may lead to failure

When conduct evaluation using suite at llm/src/evaluation, code raise exception for reasons as following:

1 encode not indicated when loading json file:

with open(dir, 'r') as j:
    json.load(j)

default encode 'gbk' cannot load in our output, but we can only use 'utf8' to keep align with given database and description file. Thus it would be better to explicitly indicate encode like this:

with open(dir, 'r', encoding='utf8') as j:
    json.load(j)

this problem can be found at:

# `llm/src/evaluation.py`
line 9, line 55, line 65

# `llm/src/evaluation_ves.py`
line 80, line 90, line 132

2 type error after loading json

output file is organized as following format:

[
  [
    "What is the highest eligible free rate for K-12 students in the schools in Alameda County?",
    "SELECT MAX(`Free Meal Count (K-12)` / `Enrollment (K-12)`) as HighestEligibleFreeRate  FROM frpm  WHERE `County Name` = 'Alameda'  AND `Free Meal Count (K-12)` IS NOT NULL  AND `Enrollment (K-12)` IS NOT NULL  AND `Enrollment (K-12)` > 0\t----- bird -----\tcalifornia_schools"
  ],
  [
    "Please list the lowest three eligible free rates for students aged 5-17 in continuation schools.",
    "SELECT T1.`Free Meal Count (Ages 5-17)` / T1.`Enrollment (Ages 5-17)` AS Eligible_Free_Rate  FROM frpm AS T1  INNER JOIN schools AS T2  ON T1.`CDSCode` = T2.`CDSCode`  WHERE T2.`EdOpsName` = 'Continuation School' AND T1.`Free Meal Count (Ages 5-17)` IS NOT NULL AND T1.`Enrollment (Ages 5-17)` IS NOT NULL ORDER BY Eligible_Free_Rate ASC  LIMIT 3\t----- bird -----\tcalifornia_schools"
  ],
  [
    "Please list the zip code of all the charter schools in Fresno County Office of Education.",
    "SELECT T2.`Zip`   FROM frpm AS T1   INNER JOIN schools AS T2   ON T1.`CDSCode` = T2.`CDSCode`   WHERE T1.`Charter School (Y/N)` = 1   AND T1.`District Name` = 'Fresno County Office of Education'\t----- bird -----\tcalifornia_schools"
  ],
...
]

after sql_data = json.load(fp), we found data type of 'sql_data' is 'list' but not 'dict', so 'sql_data.items()' may be not appropriate

    if mode == 'gpt':
        sql_data = json.load(open(sql_path + 'predict_' + data_mode + '.json', 'r', encoding='utf8'))
        for idx, sql_str in sql_data.items():
            ...

we can fix bug by just delete '.items()' easily

this problem can be found at:

# `llm/src/evaluation.py`
line 56

# `llm/src/evaluation_ves.py`
line 81

3 evlauation params required

when calculating VES, we found param --diff_json_path is required, if not, it may cause error in evaluation_ves. And we cannot see any execution result after a long running time, bacause result can only be printed after compute_ves_by_diff， which must need param --diff_json_path . so i recommend set it required at first, to remind user in the begining. this problem can be found at:

# `llm/src/evaluation_ves.py`
line 163

AlibabaResearch / DAMO-ConvAI