This pull request introduces several improvements and new features to the CLI commands:
Regeneration Support
Added the --allow-overwrite flag to the generation command.
This option allows regeneration of test entries even if some entries already exist.
The flag is only valid for the generate command.
Selective Test Entry Execution
Introduced a new --run-ids flag:
When enabled, this argument reads a list of test entry IDs from the file test_case_ids_to_generate.json.
Only those specific test IDs will be executed, instead of the entire category.
This feature is also exclusive to the generate command, and cannot be used together with --test-category.
Customizable Result and Score Directories
Added --score-dir and --result-dir options for both the generate and evaluate commands.
These options allow users to specify custom paths for result and score directories.
Paths should be relative to the root folder of berkeley-function-call-leaderboard.
In addition, this PR contains an update to the check_illegal_python_param_name.py script to avoid storing the functions in the multi_turn categories; it won't affect the accuracy of the dataset.
This pull request introduces several improvements and new features to the CLI commands:
Regeneration Support
--allow-overwrite
flag to the generation command.generate
command.Selective Test Entry Execution
--run-ids
flag:test_case_ids_to_generate.json
.generate
command, and cannot be used together with--test-category
.Customizable Result and Score Directories
--score-dir
and--result-dir
options for both thegenerate
andevaluate
commands.berkeley-function-call-leaderboard
.In addition, this PR contains an update to the
check_illegal_python_param_name.py
script to avoid storing the functions in the multi_turn categories; it won't affect the accuracy of the dataset.