descriptinc / audiotools

Object-oriented handling of audio data, with GPU-powered augmentations, and more.
https://descriptinc.github.io/audiotools/
MIT License
219 stars 37 forks source link

Adding Transforms #20

Closed pseeth closed 2 years ago

pseeth commented 2 years ago

This PR adds and extends a lot of functionality in AudioTools. The main feature is that of adding a transform pipeline, which can be used very easily to augment your audio signals, like so:

from audiotools import AudioSignal
import audiotools.data.transforms as tfm

audio_path = "tests/audio/spk/f10_script4_produced.wav"
signal = AudioSignal(audio_path, offset=10, duration=2)

# prob indicates the probability that the transform is applied.
# transforms are applied in sequence.
transform = tfm.Compose([
    tfm.LowPass(prob=0.8),
    tfm.ClippingDistortion(prob=0.5),
    tfm.MuLawQuantization(),
    tfm.VolumeChange(),
])

seed = 0
# transforms get instantiated, which creates a dictionary with all the
# parameters they need to run the transform. you pass this dictionary
# into the transform when you call it, after adding the necessary
# "signal" key.
batch = transform.instantiate(seed, signal)
batch["signal"] = signal
batch = transform(batch)

# original signal is at batch["original"]
# augmented signal is at batch["signal"]

In addition to above, we've added more core functionality:

New effects:

Better indexing:

Data utilities:

Discourse tools:

If key only refers to the batch index, then the _loudness and stft_data tensors also come along for the ride.

Some bug fixes:

Some API changes:

Transforms testing

One quick note about how transforms are tested. We test transforms just as smoke tests, to make sure they run, and if run twice with the same parameters, create the same audio file. When you create a Transform in data/transforms.py, a test will automatically be created for it. The regression data (an audio file) gets created in tests/regression/transforms/[transform_name].wav if it is not already there when you run:

python -m pytest -k test_transforms

If the regression audio file is there, then the test compares the output of the run transform with that of the regression audio.