YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.06k stars 202 forks source link

'ASTModel' object has no attribute 'module' #80

Open couragelfyang opened 1 year ago

couragelfyang commented 1 year ago

I encountered this error when applying this repo on my data. How to fix it?

YuanGongND commented 1 year ago

Hi, could you paste the full error message? And which code are you running?

couragelfyang commented 1 year ago
---------------AST Model Summary---------------
ImageNet pretraining: True, AudioSet pretraining: True
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    main(args, config)
  File "main.py", line 85, in main
    self.audio_model = ASTModel(label_dim=527, fstride=10, tstride=10, input_fdim=128,
  File "model/ast_models.py", line 132, in __init__
[1:ast_models.py]*[2:runner.py]*!
    self.v = audio_model.v
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'v'

I'm running my code. I'd like to make ASTModel act as an encoder encoding the input signals.

YuanGongND commented 1 year ago

If you could paste your code piece of model creation and weight loading, I can take a look. But it seems that you forgot to convert the model to a torch.nn.DataParallel object. Something like:

if not isinstance(audio_model, torch.nn.DataParallel):
    audio_model = torch.nn.DataParallel(audio_model)

The reason is our weights are trained with torch.nn.DataParallel, so you need to convert the model to correctly load the pretrained weights.

YuanGongND commented 1 year ago

One simple sample of correct model creation and weight loading is in the Colab script at https://github.com/YuanGongND/ast/blob/master/Audio_Spectrogram_Transformer_Inference_Demo.ipynb

It can be run online with one click. I guess what you need is:

input_tdim = 1024
checkpoint_path = '/content/ast/pretrained_models/audio_mdl.pth'
ast_mdl = ASTModel(label_dim=527, input_tdim=input_tdim, imagenet_pretrain=False, audioset_pretrain=False)
print(f'[*INFO] load checkpoint: {checkpoint_path}')
checkpoint = torch.load(checkpoint_path, map_location='cuda')
audio_model = torch.nn.DataParallel(ast_mdl, device_ids=[0])
audio_model.load_state_dict(checkpoint)
couragelfyang commented 1 year ago

Yeah, I've tried added DataParallel, it does work. BTW, does AST only support the input of length 1024?

YuanGongND commented 1 year ago

does AST only support the input of length 1024

No, it supports any length with or without using pretrained weights. If you don't use pretrained weights, just change the input_tdim when you create the AST model.

If you want to use pretrained weights, do NOT:

ast_mdl = ASTModel(label_dim=527, input_tdim=input_tdim, imagenet_pretrain=False, audioset_pretrain=False)
print(f'[*INFO] load checkpoint: {checkpoint_path}')
checkpoint = torch.load(checkpoint_path, map_location='cuda')
audio_model = torch.nn.DataParallel(ast_mdl, device_ids=[0])
audio_model.load_state_dict(checkpoint)

Do:

ast_mdl = ASTModel(label_dim=527, input_tdim=input_tdim, imagenet_pretrain=True, audioset_pretrain=True)
audio_model = torch.nn.DataParallel(ast_mdl, device_ids=[0])

The reason is the model need to internally adjust the positional embedding mismatch between pretraining and fine-tuning due to the difference of tdim used. Check our ESC-50 recipe (tdim=512) to see how that works in detail.