AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.08k stars 176 forks source link

QueryHistoryToday description mismatch #142

Open Lucaweihs opened 3 months ago

Lucaweihs commented 3 months ago

Issue

The description of the QueryHistoryToday API in code ("This API queries the history of the given date.") does not match the description present in the jsonl files QueryHistoryToday-level-3-2.jsonl and QueryHistoryToday-level-3-3.jsonl ("This API queries the history of a given user today."). This means seems to mean you are guaranteed to fail on these examples given the strict == check in the check_api_call_correctness method of tool_search.py.

Suggested fixes

  1. Change

    response['output'] == groundtruth['output']

    to

    response['api_name'] == groundtruth['api_name']

    as we only really care that it found the correct API and know what the description of the API is.

  2. Update the jsonl files to use the correct description.

Lucaweihs commented 3 months ago

Also, I believe SymptomSearch-AppointmentRegistration-level-2-1.jsonl is invalid as the API call is simply stated in text

{"role": "AI", "text": "Great. Let me register your account now. [RegisterUser(username='user4', password='password4', email='user4@example.com')]"}

rather than being properly executed using the API role.

Lucaweihs commented 3 months ago

More issues:

  1. There are unexpected case changes between the saved jsonl files and the databases (e.g. "blinds" in Scenes.json does not match "blinds" in QueryScene-level-1-1.jsonl).
  2. Random numbers are never seeded (e.g. appointment_id = str(random.randint(10000000, 99999999)) in appointment_registration.py) so will change across runs making things incomparable.
Lucaweihs commented 3 months ago

Another: