Unable to utilize main_pretrain.py for speechcommands dataset training

Hello, I would like to use the speechcommands dataset for pretraining, but I have encounter an error.

The command is as follows:

python main_pretrain.py \ --dataset='speechcommand' \ --data_train='./speechcommand_train_data.json' \ --data_eval='./datafiles/speechcommand_eval_data.json' \ --label_csv='./speechcommands_class_labels_indices.csv' \

The following error occurs:

File "../miniconda3/envs/mae/lib/python3.9/site-packages/timm/models/swin_transformer.py", line 330, in _shifted_window_attn x = x.view(B, H, W, C) RuntimeError: shape '[16, 64, 8, 512]' is invalid for input of size 524288

I modify this line of code: line 180 ( main_pretrain.py ) target_length = {'audioset':1024, 'esc50':512, 'speechcommands':128}

If I change " 'speechcommands':128 " to 1024, it runs smoothly, but I want to execute it with 128.

Could you please help me understand where I went wrong? Thank you!

facebookresearch / AudioMAE

Unable to utilize main_pretrain.py for speechcommands dataset training #16