EmilianPostolache / stable-audio-controlnet

Fine-tune Stable Audio Open with DiT ControlNet.
Other
171 stars 4 forks source link

How to define conditioning dictionary with a single audio file? #3

Open Reuben-Sun opened 1 week ago

Reuben-Sun commented 1 week ago

I am currently working on running inference code that hope to input an audio file along with a prompt to generate an output audio file (.wav format). However, I am unsure about how to properly input my custom audio file in conditioning within the code. I would greatly appreciate it if someone could provide me with an example or guidance on how to achieve this.

model, model_config = get_pretrained_controlnet_model("stabilityai/stable-audio-open-1.0", controlnet_types=["audio"], depth_factor=0.2)
model = model.cuda()

sample_size = model_config["sample_size"]
sample_rate = model_config["sample_rate"]

prompt = "genre: rock; in: vocals_1, bass_1, drums_1; out: guitar_1"

conditioning = [{
    "audio": ???,    // how to fill in information here
    "prompt": prompt,
    "seconds_start": 0,
    "seconds_total": 40
}]

output = generate_diffusion_cond(
        model,
        steps=100,
        cfg_scale=7.0,
        conditioning=conditioning,
        sample_size=sample_size,
        sigma_min=0.3,
        sigma_max=500,
        sampler_type="dpmpp-3m-sde",
        device="cuda"
)
torchaudio.save("output.wav", output[0].cpu(), sample_rate=44100)
EmilianPostolache commented 1 week ago

Please check out the inference_musdb_audio_large.ipynb which provides the requested information.