Stability-AI / stable-audio-tools

Generative models for conditional audio generation
MIT License
2.52k stars 231 forks source link

Gradio issue - Seconds total is ignored #85

Open SoftologyPro opened 3 months ago

SoftologyPro commented 3 months ago

No matter what the seconds total slider is set to the output wav is always 47 seconds in length?

SoftologyPro commented 3 months ago

I also tried using the sample code shown here https://huggingface.co/stabilityai/stable-audio-open-1.0 outside the gradio. Changing seconds_total does not seem to work there either. ie I changed the code as follows

# Set up text and timing conditioning
conditioning = [{
    "prompt": "128 BPM tech house drum loop",
    "seconds_start": 0, 
    "seconds_total": 180
}]

and still got a 47 second wav result rather than the expected 3 minute result.

yukara-ikemiya commented 3 months ago

https://huggingface.co/stabilityai/stable-audio-open-1.0#model-description As written here, the maximum length of signal that can be generated with Stable Audio Open 1.0 is 47 seconds. That's why you always get 47 sec when you specify longer total length than that.

When you specify shorter total length, the model also return 47 sec signals. But the sound after the second you specify should be silence (0 values). Therefore, you can omit the silence segment as postprocessing.

SoftologyPro commented 3 months ago

OK, then maybe clamp the seconds total slider in the UI to 47 to avoid confusion? And maybe a quick comment in the sample code saying 47 is the max. And for shorter than 47 seconds, trim it rather than having 47 seconds with silence.

yukara-ikemiya commented 3 months ago

I also think that would be more convenient.