YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.17k stars 221 forks source link

question about normalization #141

Open emleeee opened 4 days ago

emleeee commented 4 days ago

Hi! YuanGong: Thanks for your magnificent work in audio classification task! I have a question in your AST code, could you please give me some advice?

audio_conf = {'num_mel_bins': 128, 'target_length': args.audio_length, 'freqm': args.freqm, 'timem': args.timem, 'mixup': args.mixup, 'dataset': args.dataset, 'mode':'train', 'mean':args.dataset_mean, 'std':args.dataset_std, 'noise':args.noise} val_audio_conf = {'num_mel_bins': 128, 'target_length': args.audio_length, 'freqm': 0, 'timem': 0, 'mixup': 0, 'dataset': args.dataset, 'mode':'evaluation', 'mean':args.dataset_mean, 'std':args.datasetstd, 'noise':False}

Why do you use the same args.dataset_mean and args.dataset_std in both train data and validate data?

Thanks again if you can reply!