ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.51k stars 1.01k forks source link

[BFCL] Add Support for Regeneration, Specific Test Entry IDs, and Custom Directory Locations #743

Closed Raymond112514 closed 1 day ago

Raymond112514 commented 2 weeks ago

This pull request introduces several improvements and new features to the CLI commands:

  1. Regeneration Support

    • Added the --allow-overwrite flag to the generation command.
    • This option allows regeneration of test entries even if some entries already exist.
    • The flag is only valid for the generate command.
  2. Selective Test Entry Execution

    • Introduced a new --run-ids flag:
      • When enabled, this argument reads a list of test entry IDs from the file test_case_ids_to_generate.json.
      • Only those specific test IDs will be executed, instead of the entire category.
      • This feature is also exclusive to the generate command, and cannot be used together with --test-category.
  3. Customizable Result and Score Directories

    • Added --score-dir and --result-dir options for both the generate and evaluate commands.
    • These options allow users to specify custom paths for result and score directories.
    • Paths should be relative to the root folder of berkeley-function-call-leaderboard.

In addition, this PR contains an update to the check_illegal_python_param_name.py script to avoid storing the functions in the multi_turn categories; it won't affect the accuracy of the dataset.

Raymond112514 commented 2 weeks ago

Added the --rerun-all flag. When this flag is present, the results are overwritten. Changed the logic of collect_test_case slightly.

Raymond112514 commented 2 weeks ago

Added the --result-dir and --score dir option.

CharlieJCJ commented 1 day ago

Testing on my side:

image image