Open couragelfyang opened 1 year ago
Hi, could you paste the full error message? And which code are you running?
---------------AST Model Summary---------------
ImageNet pretraining: True, AudioSet pretraining: True
Traceback (most recent call last):
File "main.py", line 118, in <module>
main(args, config)
File "main.py", line 85, in main
self.audio_model = ASTModel(label_dim=527, fstride=10, tstride=10, input_fdim=128,
File "model/ast_models.py", line 132, in __init__
[1:ast_models.py]*[2:runner.py]*!
self.v = audio_model.v
File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'v'
I'm running my code. I'd like to make ASTModel act as an encoder encoding the input signals.
If you could paste your code piece of model creation and weight loading, I can take a look. But it seems that you forgot to convert the model to a torch.nn.DataParallel
object. Something like:
if not isinstance(audio_model, torch.nn.DataParallel):
audio_model = torch.nn.DataParallel(audio_model)
The reason is our weights are trained with torch.nn.DataParallel
, so you need to convert the model to correctly load the pretrained weights.
One simple sample of correct model creation and weight loading is in the Colab script at https://github.com/YuanGongND/ast/blob/master/Audio_Spectrogram_Transformer_Inference_Demo.ipynb
It can be run online with one click. I guess what you need is:
input_tdim = 1024
checkpoint_path = '/content/ast/pretrained_models/audio_mdl.pth'
ast_mdl = ASTModel(label_dim=527, input_tdim=input_tdim, imagenet_pretrain=False, audioset_pretrain=False)
print(f'[*INFO] load checkpoint: {checkpoint_path}')
checkpoint = torch.load(checkpoint_path, map_location='cuda')
audio_model = torch.nn.DataParallel(ast_mdl, device_ids=[0])
audio_model.load_state_dict(checkpoint)
Yeah, I've tried added DataParallel, it does work. BTW, does AST only support the input of length 1024?
does AST only support the input of length 1024
No, it supports any length with or without using pretrained weights. If you don't use pretrained weights, just change the input_tdim
when you create the AST model.
If you want to use pretrained weights, do NOT:
ast_mdl = ASTModel(label_dim=527, input_tdim=input_tdim, imagenet_pretrain=False, audioset_pretrain=False)
print(f'[*INFO] load checkpoint: {checkpoint_path}')
checkpoint = torch.load(checkpoint_path, map_location='cuda')
audio_model = torch.nn.DataParallel(ast_mdl, device_ids=[0])
audio_model.load_state_dict(checkpoint)
Do:
ast_mdl = ASTModel(label_dim=527, input_tdim=input_tdim, imagenet_pretrain=True, audioset_pretrain=True)
audio_model = torch.nn.DataParallel(ast_mdl, device_ids=[0])
The reason is the model need to internally adjust the positional embedding mismatch between pretraining and fine-tuning due to the difference of tdim
used. Check our ESC-50 recipe (tdim=512
) to see how that works in detail.
I encountered this error when applying this repo on my data. How to fix it?