Closed TheZaind closed 5 months ago
I'm sorry if I misunderstood, but my understanding is that you'd like to train an audio model? I'm not sure that kohya_ss would be the correct script to use for training. As it is right now, kohya_ss is more built towards image training, and I think the issue you are describing is related to the fact that when generating an image, it's very difficult train and generate images with high contrast.
If your objective is to train to output audio, I ponder if looking into training audio specific models might make more sense.
Otherwise, the only thing I can imagine that might work, if you are saying that the "audio" images are correct, but stretched out, would be to adjust the image width to match the length, eg, if you generating 5 sec length, manual adjust the generation to however many horizontal pixels the 5 sec length is supposed to be.
Problem with big white spaces when training own model Hello,
I am currently training my own diffusion model. Everything is working fine, but when I use sounds smaller than 5 seconds (which means only a fraction of the training images is used), and I want to recreate it with my model, it fills the whole image instead of using, like in the original images, just a fraction. When converting to the sounds, it "sounds right," but it sounds just stretched. However, I want it to create the big white spaces as well. I am using over 1000 training images and trained over 500,000 iterations. Like I said, it "sounds right" to the prompt, but sounds stretched.
Example training data images:
Example generated images (who should look like the training images):
Any idea how I can fix this or what I'm doing wrong? I am using kohya_ss webUI for training.
Thanks! :)