OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.8k stars 407 forks source link

Open domain usage #42

Closed yonatanbitton closed 1 year ago

yonatanbitton commented 1 year ago

Hello and thanks for your great work 🙌 I'm trying to launch the open domain code with three tasks: (1) web search; (2) calculator; (3) claim check given evidence. Do your model support such tasks?

This is the command:

python toolbench/inference/qa_pipeline_open_domain.py --tool_root_dir data/toolenv/tools/ --corpus_tsv_path data/retrieval/G1/corpus.tsv --retrieval_model_path ToolBench_IR_bert_based_uncased --retrieved_api_nums 5 --backbone_model toolllama --model_path huggyllama/llama-7b --lora --lora_path ToolLLaMA-7b-LoRA --max_observation_length 512 --method DFS_woFilter_w2 --input_query_file data/instruction/inference_query_demo_open_domain_custom.json --output_answer_file data/answer/toolllama_lora_dfs_open_domain --rapidapi_key

This is the input:

[
    {
        "query": "How old is Joe Biden?",
        "query_id": 9999999991
    },
    {
        "query": "Solve this equation: 2x + 3 = 7",
        "query_id": 9999999992
    },
    {
        "query": "CLAIM: Jamison Crowder is a basketball player.\nEVIDENCE: Jamison Crowder: Jamison Wesley Crowder (born June 17, 1993) is an American football wide receiver for the New York Jets of the National Football League (NFL). He played college football at Duke, and was drafted by the Washington Redskins in the fourth round of the 2015 NFL Draft.\nQUESTION: Is the claim correct based on the evidence?",
        "query_id": 9999999993
    }
]

This is the output I receive:

process[0] doing task 0/3: real_task_id_9999999992
Retrieving...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49/49 [00:10<00:00,  4.84it/s]
[process(0)]now playing Solve this equation: 2x + 3 = 7, with 6 APIs
[process(0)]total tokens: 2
Thought:
Action: get_equations_for_physical_quantities_constants_and_equations
{}
query to Education-->physical_quantities_constants_and_equations-->get_equations_for_physical_quantities_constants_and_equations
0.025010755222666936
Action Input: {}
Observation: {"error": "", "response": "[{'quantity': 'Acceleration', 'categories': [{'title': 'Forces in Mechanics', 'equations': [{'equation': 'F / m'}], 'locals': {'de': {'title': 'Kr\u00e4fte in der Mechanik'}}}, {'title': 'Movement', 'equations': [{'equation': 'v / t'}], 'locals': {'de': {'title': 'Beliebige Bewegung'}}}, {'title': 'Evenly accelerated movement', 'equations': [{'equation': '( v - v_0 ) / t'}, {'equation': '2 * ( s - s_0 - v_0 * t ) / t^2'}, {'equation': '2 * s / t^2', 'conditions': ['`s_0 = 0`', '`v...
[process(0)]total tokens: 2
Thought:
Action: solve_for_sudoku_all_purpose_pro
{}
query to Gaming-->sudoku_all_purpose_pro-->solve_for_sudoku_all_purpose_pro
0.27502931836911926
Action Input: {}
Observation: {"error": "", "response": "{'message': 'This endpoint is disabled for your subscription'}"}
[process(0)]total tokens: 2
Thought:
Action: solve_for_sudoku_all_purpose_pro
{}
query to Gaming-->sudoku_all_purpose_pro-->solve_for_sudoku_all_purpose_pro
0.22321073814882275
Action Input: {}
Observation: {"error": "", "response": "{'message': 'This endpoint is disabled for your subscription'}"}
[process(0)]total tokens: 2
Thought:

Do you support these APIs? Why is it disabled for my subscription?

I also asked it in rapidapi: https://rapidapi.com/myvatAPI/api/sudoku-all-purpose-pro/discussions/98039

Thanks

thuqinyj16 commented 1 year ago

Hi,

Thanks for your interest in our work! If you have subscribed the API, then it should be ok to call the endpoint. Wondering if this is the only API that is disabled or all of your subscribed APIs are disabled? We are trying to locate whether the problem comes from RapidAPI Hub or the server hosted by the specific API.

Thanks!

yonatanbitton commented 1 year ago

Thanks for the response. I am subscribed to "solve_for_sudoku_all_purpose_pro" but it says it is disabled:

image

But more generally - do you have a support in the tasks I listed? Perhaps a more suitable APIs? For example, the web search one? Or the calculator one? Thank you

thuqinyj16 commented 1 year ago

Hi, it seems the solve_for_sudoku_all_purpose_pro has an internal error in the hosted server. It suggests that this API may be poorly served on RapidAPI. I would suggest you choose another API with similar functionalities (our API retriever may help you find a suitable one). Not all the APIs on this hub works, some have internal errors. We have filtered those low-quality APIs before experiments.

yonatanbitton commented 1 year ago

Thanks. I'm not sure solve_for_sudoku_all_purpose_pro is the most fitting API. I did try the open-domain that uses the retriever and provided its outputs above, I don't understand the "equation" output as well:

query to Education-->physical_quantities_constants_and_equations-->get_equations_for_physical_quantities_constants_and_equations
0.025010755222666936
Action Input: {}
Observation: {"error": "", "response": "[{'quantity': 'Acceleration', 'categories': [{'title': 'Forces in Mechanics', 'equations': [{'equation': 'F / m'}], 'locals': {'de': {'title': 'Kr\u00e4fte in der Mechanik'}}}, {'title': 'Movement', 'equations': [{'equation': 'v / t'}], 'locals': {'de': {'title': 'Beliebige Bewegung'}}}, {'title': 'Evenly accelerated movement', 'equations': [{'equation': '( v - v_0 ) / t'}, {'equation': '2 * ( s - s_0 - v_0 * t ) / t^2'}, {'equation': '2 * s / t^2', 'conditions': ['`s_0 = 0`', '`v...

I need two basic tasks - a query generation / web search one, and a task that requires calculator / wolfram alpha. Does ToolBench supports this tasks? I tried both open/close domains but didn't find a relevant way to do it.

{
        "query": "How old is Joe Biden?",
        "query_id": 9999999991
    },
    {
        "query": "Solve this equation: 2x + 3 = 7",
        "query_id": 9999999992
    }