I am currently working on running inference code that hope to input an audio file along with a prompt to generate an output audio file (.wav format). However, I am unsure about how to properly input my custom audio file in conditioning within the code. I would greatly appreciate it if someone could provide me with an example or guidance on how to achieve this.
model, model_config = get_pretrained_controlnet_model("stabilityai/stable-audio-open-1.0", controlnet_types=["audio"], depth_factor=0.2)
model = model.cuda()
sample_size = model_config["sample_size"]
sample_rate = model_config["sample_rate"]
prompt = "genre: rock; in: vocals_1, bass_1, drums_1; out: guitar_1"
conditioning = [{
"audio": ???, // how to fill in information here
"prompt": prompt,
"seconds_start": 0,
"seconds_total": 40
}]
output = generate_diffusion_cond(
model,
steps=100,
cfg_scale=7.0,
conditioning=conditioning,
sample_size=sample_size,
sigma_min=0.3,
sigma_max=500,
sampler_type="dpmpp-3m-sde",
device="cuda"
)
torchaudio.save("output.wav", output[0].cpu(), sample_rate=44100)
I am currently working on running inference code that hope to input an audio file along with a prompt to generate an output audio file (.wav format). However, I am unsure about how to properly input my custom audio file in
conditioning
within the code. I would greatly appreciate it if someone could provide me with an example or guidance on how to achieve this.