facebookresearch / AudioMAE

This repo hosts the code and models of "Masked Autoencoders that Listen".
Other
547 stars 45 forks source link

Unable to utilize main_pretrain.py for speechcommands dataset training #16

Closed unoct closed 11 months ago

unoct commented 1 year ago

Hello, I would like to use the speechcommands dataset for pretraining, but I have encounter an error.

The command is as follows:

python main_pretrain.py \ --dataset='speechcommand' \ --data_train='./speechcommand_train_data.json' \ --data_eval='./datafiles/speechcommand_eval_data.json' \ --label_csv='./speechcommands_class_labels_indices.csv' \


The following error occurs:

File "../miniconda3/envs/mae/lib/python3.9/site-packages/timm/models/swin_transformer.py", line 330, in _shifted_window_attn x = x.view(B, H, W, C) RuntimeError: shape '[16, 64, 8, 512]' is invalid for input of size 524288


I modify this line of code: line 180 ( main_pretrain.py ) target_length = {'audioset':1024, 'esc50':512, 'speechcommands':128}

If I change " 'speechcommands':128 " to 1024, it runs smoothly, but I want to execute it with 128.

Could you please help me understand where I went wrong? Thank you!

unoct commented 11 months ago

Self-Answer: In models_mae.py and models_vit.py, modify the 'unpatchify' (in models_mae) and 'random_masking_2d' (in both models_mae & models_vit) to parameters suitable for SPC.