Closed krishnakalyan3 closed 2 years ago
cc: @marianna13 for review.
@YuchenHui22314 looks like we already have this dataset here perhaps we should deduplicate?
cc: @marianna13 for review.
@YuchenHui22314 looks like we already have this dataset here perhaps we should deduplicate?
Oh, it seems that they are deleted afterwards. However, you can redownload them using this script
@YuchenHui22314 @marianna13 could you please review every thing that is done so far?.
I am not sure about the following
webdataset_tar
. The total number of pairs in this dataset is 3930
. Does -num_element 512
sound good for this dataset?. Should this command below also be part of this github repo somewhere?python make_tar.py --input /fsx/MACS/processed_datasets --output /fsx/MACS/processed_datasets/webdataset_tar/ --dataclass all --num_element 512 --filename MCAS
@krishnakalyan3 Please check dircord message
MACS data processing pipeline.
TODO:
s3://s-laion-audio/raw_dataset/MACS
s3://s-laion-audio/webdataset_tar/MACS
MACS dataset is an augmented version of the TAU Urban Acoustic Scenes 2019 dataset