Closed conceptofmind closed 4 months ago
Can you tell me which sub dataset? What was the exact code you ran?
Can you tell me which sub dataset? What was the exact code you ran?
Hi,
The error did not clearly define which subset was causing the issue. From some testing peers and I did, it seems like unified_p3.jsonl.gz
may be part of the issue. There are still problems even after removing that file.
The dataset is from the Huggingface hub: https://huggingface.co/datasets/laion/OIG/tree/main
The code is pretty standard:
from datasets import load_dataset
dataset = load_dataset('laion/OIG', split = 'train')
Thank you,
Enrico
This error also occurs as well if you get rid of unified_p3:
raise TypeError(f"Couldn't cast array of type\n{array.type}\nto\n{feature}")
TypeError: Couldn't cast array of type
struct<labels: list<item: string>, source: string>
to
{'source': Value(dtype='string', id=None)}
The above exception was the direct cause of the following exception:
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
So there may be a few different files that have issues.
Ok. thank you. will check p3 and see if we can track it down. p3 is huge... so i actually don't use load_dataset. I load it using json
Hi all,
Thanks for the awesome work.
I am receiving this error when trying to load the OIG dataset from Huggingface:
Any input would be greatly appreciated.
Thank you,
Enrico