LAION-AI / audio-dataset

Audio Dataset for training CLAP and other models
615 stars 53 forks source link

CLAP_freesound broken #99

Open cyrusvahidi opened 9 months ago

cyrusvahidi commented 9 months ago

I am trying to use CLAP_freesound from huggingface datasets. I get the following error when trying to load the dataset with datasets:

(ssl) bash-4.2$ python
Python 3.10.7 (main, Nov  2 2022, 14:46:09) [GCC 12.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datasets import load_dataset
>>> load_dataset("Meranti/CLAP_freesound")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/home/acw532/venvs/ssl/lib/python3.10/site-packages/datasets/load.py", line 2128, in load_dataset
    builder_instance = load_dataset_builder(
  File "/data/home/acw532/venvs/ssl/lib/python3.10/site-packages/datasets/load.py", line 1814, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/data/home/acw532/venvs/ssl/lib/python3.10/site-packages/datasets/load.py", line 1511, in dataset_module_factory
    raise e1 from None
  File "/data/home/acw532/venvs/ssl/lib/python3.10/site-packages/datasets/load.py", line 1495, in dataset_module_factory
    ).get_module()
  File "/data/home/acw532/venvs/ssl/lib/python3.10/site-packages/datasets/load.py", line 1053, in get_module
    module_name, default_builder_kwargs = infer_module_for_data_files(
  File "/data/home/acw532/venvs/ssl/lib/python3.10/site-packages/datasets/load.py", line 512, in infer_module_for_data_files
    raise ValueError(f"Couldn't infer the same data file format for all splits. Got {split_modules}")
ValueError: Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): (None, {}), NamedSplit('test'): ('json', {})}

I am finding it difficult to get started with LAION-audio-630k and aggregate all of the datasets.