LAION-AI / audio-dataset

Audio Dataset for training CLAP and other models
632 stars 53 forks source link

MACS data #67

Closed krishnakalyan3 closed 2 years ago

krishnakalyan3 commented 2 years ago

MACS data processing pipeline.

TODO:

MACS dataset is an augmented version of the TAU Urban Acoustic Scenes 2019 dataset

krishnakalyan3 commented 2 years ago

cc: @marianna13 for review.

@YuchenHui22314 looks like we already have this dataset here perhaps we should deduplicate?

YuchenHui22314 commented 2 years ago

cc: @marianna13 for review.

@YuchenHui22314 looks like we already have this dataset here perhaps we should deduplicate?

Oh, it seems that they are deleted afterwards. However, you can redownload them using this script

krishnakalyan3 commented 2 years ago

@YuchenHui22314 @marianna13 could you please review every thing that is done so far?.

I am not sure about the following

python make_tar.py --input /fsx/MACS/processed_datasets --output /fsx/MACS/processed_datasets/webdataset_tar/ --dataclass all --num_element 512 --filename MCAS

YuchenHui22314 commented 2 years ago

@krishnakalyan3 Please check dircord message