facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.71k stars 2.11k forks source link

Generate via Command Line #187

Open geograman opened 1 year ago

geograman commented 1 year ago

My Web-GUI is running fine. Is there a documentation on how to generate via command line?

tob-har commented 1 year ago

There is now some API documentation: https://facebookresearch.github.io/audiocraft/api_docs/audiocraft/index.html

I am using this little script as starting point. Just execute via terminal: (running on mac, M1, CPU)

import torchaudio
import datetime

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

# set date_time for file name
current_date_time = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

model = MusicGen.get_pretrained('facebook/musicgen-small')
model.set_generation_params(
    use_sampling=True,
    top_k=250, #int
    top_p=1, #float
    temperature=0.8, #float
    cfg_coef=9.0,  #float
    #extend_stride=20, 
    duration=2 # generate x seconds
)  
# wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
#descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
descriptions = [
    'Portishead infused slow trip hop beat with old school samples rhodes pianos and sub bass, turn table static noise, hi quality'
    ]

wav = model.generate(descriptions)  # generates as many samples as in comma seperated descriptions

#small, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
#wav = model.generate(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    file_name = f'{idx}_{current_date_time}'

    audio_write(
        #f'{idx}',
        file_name,
        one_wav.cpu(),
        #model.sample_rate,
        format = "wav",
        sample_rate = model.sample_rate,
        normalize = True,
        strategy = "loudness", #clip, peak, rms, loudness
        loudness_headroom_db = 16,
        #peak_clip_headroom_db = 1.0,
        #rms_headroom_db = 18,
        loudness_compressor = True,
        log_clipping = True #only when strategy = loudness
        )