Closed ruby11dog closed 4 months ago
Hi, @ruby11dog
The addition of the script gigaspeech2.py
to Hugging Face is currently disabled. We do not support this functionality as our dataset was not uploaded through datasets
.
To download the Thai subset, you can use the following commands:
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/speechcolab/gigaspeech2
git lfs pull --include "data/th"
when i run "dataset = load_dataset("speechcolab/gigaspeech2", split='data.th')" The program was interrupted by: Traceback (most recent call last): File "/root/miniforge3/envs/audio_process/lib/python3.8/site-packages/datasets/builder.py", line 1894, in _prepare_split_single writer.write_table(table) File "/root/miniforge3/envs/audio_process/lib/python3.8/site-packages/datasets/arrow_writer.py", line 570, in write_table pa_table = table_cast(pa_table, self._schema) File "/root/miniforge3/envs/audio_process/lib/python3.8/site-packages/datasets/table.py", line 2324, in table_cast return cast_table_to_schema(table, schema) File "/root/miniforge3/envs/audio_process/lib/python3.8/site-packages/datasets/table.py", line 2282, in cast_table_to_schema raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match") ValueError: Couldn't cast