chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 281 forks source link

Generating very large audio dataset #43

Closed saraalemadi closed 5 years ago

saraalemadi commented 5 years ago

Hi @chrisdonahue,

I am trying to generate a very large dataset (1M audio clips) from a trained waveGAN model, I was able to incoperate a for loop to generate 4 audio clips, however, I have noticed when I increqse that to a million I get an error related to this line specifically in generate.py: _z = (np.random.rand(64, 100) * 2.) - 1.

Any tips on how to resolve this issue?

Thanks, Sara

Full generate.py code with my modification for your reference:

import argparse
import glob
import sys
import os
import librosa
import numpy as np
import tensorflow as tf

def get_arguments():
  parser = argparse.ArgumentParser(description='WaveGan generation script')
  parser.add_argument(
        'checkpoint', type=str, help='Which model checkpoint to generate from e.g. "(fullpath)/model.ckpt-XXX"')
  parser.add_argument('--train_dir', type=str, help='Training directory')
  parser.add_argument('--wav_out_path', type=str, help='Path to output wav file')
  arguments = parser.parse_args()

  return arguments

def main():
  args = get_arguments()
  infer_dir = os.path.join(args.train_dir, 'infer')
  infer_metagraph_fp = os.path.join(infer_dir, 'infer.meta')
  tf.reset_default_graph()
  saver = tf.train.import_meta_graph(infer_metagraph_fp)
  graph = tf.get_default_graph()
  sess = tf.InteractiveSession()
  saver.restore(sess, args.checkpoint)
  _z = (np.random.rand(64, 100) * 2.) - 1.
  z = graph.get_tensor_by_name('z:0')
  G_z = graph.get_tensor_by_name('G_z:0')[:, :, 0]
  waveform = sess.run(G_z, {z: _z})
  ndisplay= 64
  for i in range(ndisplay):
   librosa.output.write_wav(args.wav_out_path+str(i), waveform[i], 16000)
  sess.close()

  print('Finished generating.')

if __name__ == '__main__':
  main()
chrisdonahue commented 5 years ago

You'll want to go through it in batches. Pseudocode:

BATCH_SIZE = 50
NUM_GEN = 1000000
for i in range(0, NUM_GEN, BATCH_SIZE):
  _z = (np.random.rand(BATCH_SIZE, 100) * 2.) - 1.'
  _G_z = ...
  for j in range (BATCH_SIZE):
     clip = _G_z[j]
     librosa.write_wav(...)