FudanDISC / DISC-LawLLM

DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services
Apache License 2.0
509 stars 58 forks source link

SFT datasets error #18

Open arthasyou opened 10 months ago

arthasyou commented 10 months ago

from datasets import load_dataset

dataset = load_dataset("ShengbinYue/DISC-Law-SFT")

----------------------------------------------------------------------------------
error:
Generating train split: 166758 examples [00:00, 184286.58 examples/s]
Traceback (most recent call last):
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1940, in _prepare_split_single
    writer.write_table(table)
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/arrow_writer.py", line 572, in write_table
    pa_table = table_cast(pa_table, self._schema)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/table.py", line 2328, in table_cast
    return cast_table_to_schema(table, schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/table.py", line 2286, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
id: string
reference: list<item: string>
  child 0, item: string
input: string
output: string
to
{'id': Value(dtype='string', id=None), 'input': Value(dtype='string', id=None), 'output': Value(dtype='string', id=None)}
because column names don't match

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2153, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1813, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset