archinetai / audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.
MIT License
1.95k stars 169 forks source link

Unconditional Generation generates noise #82

Open reachomk opened 8 months ago

reachomk commented 8 months ago

Hi,

I'm training on a dataset of songs, and I was training with this package. After about 10 epochs (of 1000 samples each) the loss seems to converge, however after I sample I get pure noise. My intuition is even if the model is converging to a local minima, or I've not trained for enough time, it still should be producing some output (garbage in garbage out should still produce something other than pure noise). Thus I'm led to believe that there's an issue with the way I'm generating the audio. I've attached my code below.

Any suggestions, or anything more I need to provide?

def generate_samples(model, num_samples, sample_rate, audio_length_seconds, device):
    """
    Generate audio samples from the trained diffusion model.

    :param model: The trained diffusion model.
    :param num_samples: The number of audio samples to generate.
    :param sample_rate: The sample rate of the audio.
    :param audio_length_seconds: The length of the audio to generate, in seconds.
    :param device: The device ('cpu' or 'cuda') to run the sampling on.
    :return: A tensor containing the generated audio samples.
    """
    audio_length = sample_rate * audio_length_seconds
    # Initialize with random noise
    noise = torch.randn(num_samples, 1, audio_length, device=device)

    model.eval() 

    with torch.no_grad():  
        samples = model.sample(noise, num_steps=100)  

    return samples

waveform, sample_rate = torchaudio.load(audio_path)
# Example usage after training the model:
num_samples = 1  # Number of samples to generate
#sample_rate = dataset.sample_rate  # Sample rate of the audio
audio_length_seconds = 20  # Length of the audio to generate, in seconds
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Generate samples
generated_audio = generate_samples(model, num_samples, sample_rate, audio_length_seconds, device)

# Save the generated samples as FLAC files
for i, audio_tensor in enumerate(generated_audio):
    filename = f"generated_sample_{i+1}.flac"
    torchaudio.save(filename, audio_tensor.cpu(), sample_rate)
    print(f"Saved: {filename}")