AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.09k stars 179 forks source link

Unable to download the dataset from huggingface #97

Open XuanRen4470 opened 8 months ago

XuanRen4470 commented 8 months ago

I tried to use huggingface to download the dataset


from datasets import load_dataset

dataset = load_dataset("liminghao1630/API-Bank")

but it gave me the error

An error occurred while generating the dataset

ValueError: Couldn't cast
input: string
file: string
id: int64
expected_output: string
instruction: string
to
{'input': Value(dtype='string', id=None), 'instruction': Value(dtype='string', id=None), 'output': Value(dtype='string', id=None)}
because column names don't match

I also tried to download the json file directly to my local machine, but when reading the files, the input format is like json format instead of a text string.

For example, this is the first input instruction.

"\nGenerate an API request in the format of [ApiName(key1='value1', key2='value2', ...)] based on the previous dialogue context.\nThe current time is 2039-03-09 18:56:09 Wednesday.\nInput: \nUser: User's utterence\nAI: AI's response\n\nExpected output:\nAPI-Request: [ApiName(key1='value1', key2='value2', ...)]\n\nAPI descriptions:\n"

How do i load the dataset properly?

Thank you for your help

liminghao1630 commented 7 months ago

@XuanRen4470 Yes, please directly download them and load them by json. You can refer to the evaluator_by_json.py for the loading code.

zfchenUnique commented 7 months ago

Hi Minghao, Thanks for the great work! Could you please share some results and examples on how to evaluate existing models on your benchmark like how to get the results for GPT-3.5? So that it would be easy for others to perform an apple-to-apple comparison on the benchmark.