Closed yuyun2000 closed 5 months ago
I was wrong. I thought he was a VAE like model that could be used in diffusion models, but he doesn't have the ability to restore audio
Hey, so technically he can, but I don't provide the decoder here. VAE and MAE are both autoencoders, so that's that. If you want you could just attach a decoder ontop of the model and train your own latents.
Btw, maybe for your info, but SemantiCodec is a codec based on MAE, which shows promosing performance. I would assume that Dasheng greatly outperforms their AudioMAE. Further Frepainter has shown that MAE based approaches can outperform diffusion for super-resolution.
In both cases you would just need to attach a decoder on your dasheng features and train. I believe you can give it a try :) Kind regards, Heinrich
I am just a beginner, but if given the opportunity, I will definitely use Dasheng to showcase its capabilities. :)
Hey, sorry I don't get the title nor the question.