ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.54k stars 1.01k forks source link

[Apibench] - Error while loading Gorilla Dataset with HuggingFace #101

Open AnkitaNaik opened 1 year ago

AnkitaNaik commented 1 year ago

Describe the issue Loading Gorilla Dataset using HuggingFace gives a json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 2103)

ID datapoint

  1. Datapoint permalink: https://huggingface.co/datasets/gorilla-llm/APIBench
  2. Provider: TorchHub/HuggingFace/PyTorch Hub
  3. Gorilla repo commit #:

What is the issue from datasets import load_dataset dataset = load_dataset("gorilla-llm/APIBench", split = 'train')

The above commands give a json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 2103) error. Is there a way to load the json files using an API?

ShishirPatil commented 1 year ago

Hey @AnkitaNaik were you able to figure this out? If not, can you give me the steps to recreate it?

hanseungwook commented 4 months ago

@ShishirPatil Hey it seems that the problem still has not been solved. It comes from the fact that some rows have different data types for the same column. For example, api_arguments data type isn't consistent across rows (some are arrays and some are string) and so is python_environment_requirements

How can we solve this quickly?

hanseungwook commented 4 months ago

The steps the reproduce are exactly as Ankita mentioned above:

from datasets import load_dataset
dataset = load_dataset("gorilla-llm/APIBench", split = 'train')