Closed ooobsidian closed 2 years ago
Hi there,
The short answer is no, it is not supported yet in our code, and we do not have a plan to add that feature.
The reason is that when you change the kernel size, you change the patch splitting strategy, e.g., when you set kernel size as (8,8), you are using 8*8 patches rather than 16*16 patches, while the ImageNet pretrained models are pretrained on 16*16 patches. Therefore, the patch splitting layer will complain.
If you don't use ImageNet pertaining, AST should support arbitrary patch sizes, you will at least change this line in addition to the get_shape
function. You might need to change other things. But without ImageNet pertaining, the AST performance is not competitive.
Finally, I want to point to our recent work of SSAST, which I hope will be released soon. SSAST uses self-supervised pertaining as a replacement of ImageNet pertaining, which does not constrain the patch size to be 16*16. Nevertheless, in SSAST, we have not tried 8*8 patch either (so no 88 pretrained model will be released, you will need to pretrain it by yourself, but that is fully supported) as it is expensive. We do plan to release 128\2 pretrained model.
-Yuan
Thank you very much for your quick reply, it solved my problem. At the same time, I will continue to follow SSAST.
Hello @YuanGongND, I'm sorry to bother you again.
I would like to ask you a question: How to change the kernel size to change the number of patches, I USE ImageNet pretrained model and NOT USE AudioSet pretrained model, but I have this problem.
I only changed the get_shape function, like this
So, What is the correct way to do this? Looking forward to your answer.