LAION-AI / audio-dataset

Audio Dataset for training CLAP and other models
640 stars 53 forks source link

Missing 'tag' key in FSD50k preprocessor #88

Open sakshamsingh1 opened 1 year ago

sakshamsingh1 commented 1 year ago

Hi, Thanks for sharing the wonderful code.

According to the readme of data preprocess (here) there should be a key of 'tag' (containing labels) in the output JSON file after preprocessing. Screenshot 2023-02-21 at 2 12 11 PM This tag extraction/creation is missing in the preprocess_FSD50K.py file.

Am I understanding something incorrectly or there is 'tag' creation missing in the file?

Thanks, Saksham

YuchenHui22314 commented 1 year ago

Hi, I think they are processed in preprocess_FSD50K.py, specifically, here: https://github.com/LAION-AI/audio-dataset/blob/main/data_preprocess/preprocess_FSD50K.py#L119