facebookresearch / AudioMAE

This repo hosts the code and models of "Masked Autoencoders that Listen".
Other
526 stars 44 forks source link

VIT-L checkpoint and reproducing the visualization results #20

Open i-need-sleep opened 1 year ago

i-need-sleep commented 1 year ago

Hello,

Thanks for the great repo.

I am trying to reproduce the visualization results in the paper for the reconstructed spectrograms. Following the demo notebook and using the pretrained ViT-B checkpoint, the results I got (see attached) are notably worse than those reported in the paper.

I note that the visualizations in the paper are based on the larger ViT-L model. Is it possible for you to share the pretrained checkpoint?

Additionally, can you confirm whether the model configuration used in the notebook is correct?

Thanks in advance!

masked (Masked with a ratio of 0.3) recons_pasted (Reconstructed patches)

shirly-24 commented 10 months ago

Hello,

Thanks for the great repo.

I am trying to reproduce the visualization results in the paper for the reconstructed spectrograms. Following the demo notebook and using the pretrained ViT-B checkpoint, the results I got (see attached) are notably worse than those reported in the paper.

I note that the visualizations in the paper are based on the larger ViT-L model. Is it possible for you to share the pretrained checkpoint?

Additionally, can you confirm whether the model configuration used in the notebook is correct?

Thanks in advance!

masked (Masked with a ratio of 0.3) recons_pasted (Reconstructed patches)

Hello, I have faced the similar issue. Have you managed to resolve it? I would appreciate it if you could share any insights or solutions you might have. Thank you!

i-need-sleep commented 10 months ago

I got in touch with one of the authors. This seems to be the expected behaviour of the ViT-B checkpoint.

wsntxxn commented 6 months ago

Hi, do you have access to ViT-L checkpoint? I am also looking for this.