Open 980202006 opened 5 months ago
You VAE loss is really low, may I ask how many steps do you train the VAE? I'm currently training the VAE for about 300000 steps and the loss is about 3.5. I'm using a quite large dataset.
Sorry, I remembered it wrong. VAE loss is about 1.0, but the effect is still not good enough. Can you tell me about your data volume and distribution? Maybe it's a coding problem?
I'm using the stable_audio_2_0_vae.json config, i didn't change the model. I think maybe your dataset is not large enough? I'm using Audioset + VGGSound + mtg-jamendo + BBCSoundEffect +CommonVoice + free_to_use_sound and a bunch of other smaller dataset, here is my sample:
It is true that your data set is more complex and the loss should be larger. Is there a demo of the reconstruction effect?
It is true that your data set is more complex and the loss should be larger. Is there a demo of the reconstruction effect?
The video in my above comment is the demo, I think it's quite good.
Yes! This reconstruction works very well. What is the loss of the diffusion model you trained?
I will start to train the diffusion model in the next few days, will update my status in this issue :)
Waiting for your good news!
Thank you for sharing @BingliangLi ❤️
Would you kindly share your model config and dataset config for beginners' reference? I would really appreciate it!!
Sorry, I remembered it wrong. VAE loss is about 1.0, but the effect is still not good enough. Can you tell me about your data volume and distribution? Maybe it's a coding problem?
my diffusion's mse loss is 0.8 and I have the same concern with you... could you please also share your diffusion loss curve figure ? Thank you.
我使用的是 stable_audio_2_0_vae.json 配置,我没有更改模型。我想也许你的数据集不够大?我使用的是 Audioset + VGGSound + mtg-jamendo + BBCSoundEffect +CommonVoice + free_to_use_sound 和一堆其他较小的数据集,这是我的示例:
侦察_00326001.mp4
This reconstruction works very well,How many GPUs did you use for training, and what was the batch size?
我使用的是 stable_audio_2_0_vae.json 配置,我没有更改模型。我想也许你的数据集不够大?我使用的是 Audioset + VGGSound + mtg-jamendo + BBCSoundEffect +CommonVoice + free_to_use_sound 和一堆其他较小的数据集,这是我的示例: 侦察_00326001.mp4
This reconstruction works very well,How many GPUs did you use for training, and what was the batch size?
I trained the model for 1000000 stpes with 8 A100 80G, with batch size set to 16, however the recon result is good enough around 500000 steps. I haven't start to train the diffusion yet, but I will do it before July.
hey there @BingliangLi , did you endup training the diffusion model? Great results! I would love to know what parameters you used
I am training VAE and stable audio2 models from scratch, how much will VAE and Diffusion loss reach?My current VAE loss is about 0.4, and diffusion’s mse loss is 0.53.