This PR adds and extends a lot of functionality in AudioTools. The main feature is that of adding a transform pipeline, which can be used very easily to augment your audio signals, like so:
from audiotools import AudioSignal
import audiotools.data.transforms as tfm
audio_path = "tests/audio/spk/f10_script4_produced.wav"
signal = AudioSignal(audio_path, offset=10, duration=2)
# prob indicates the probability that the transform is applied.
# transforms are applied in sequence.
transform = tfm.Compose([
tfm.LowPass(prob=0.8),
tfm.ClippingDistortion(prob=0.5),
tfm.MuLawQuantization(),
tfm.VolumeChange(),
])
seed = 0
# transforms get instantiated, which creates a dictionary with all the
# parameters they need to run the transform. you pass this dictionary
# into the transform when you call it, after adding the necessary
# "signal" key.
batch = transform.instantiate(seed, signal)
batch["signal"] = signal
batch = transform(batch)
# original signal is at batch["original"]
# augmented signal is at batch["signal"]
In addition to above, we've added more core functionality:
clone: A new function for cloning a signal, rather than needing to deepcopy all the time.
detach: Detaches all the constituent tensors for a signal from gradients.
shape: Get the shape more easily.
New effects:
high_pass: Added a high_pass effect.
apply_ir: New function that does an entire pipeline of augmenting, and then applying an impulse response.
mix: Now takes in other_eq (optionally), which applies .equalizer to the other signal being mixed in.
volume_change: Boosts or cuts volume of a signal.
clip_distortion: Makes a signal clip.
quantization: Quantizes the samples of a signal.
mulaw_quantization: Same as above, but uses mu-law to quantize.
Better indexing:
signal[key] now returns an AudioSignal object, instead of an array.
signal[key] = other_signal_or array now works. The associated parts of signal.audio_data will be set.
Data utilities:
A collate function has been added to use with DataLoader objects on datasets that return AudioSignals.
CSV utilities: creation and reading of CSVs based on folders of audio files.
Discourse tools:
Made some small changes to audio_table that allow you to put non-AudioSignal objects as values in a dictionary, and they'll render appropriately.
Added a newline before printing, in case prints before the table break the syntax.
Exposed some necessary kwargs up to post.disp, to specify things like the name of the first_column.
If key only refers to the batch index, then the _loudness and stft_data tensors also come along for the ride.
Some bug fixes:
self._loudness now gets moved to the appropriate device inside of .to.
self.stft_data also get moved to the appropriate device.
Some API changes:
Indexing works differently as mentioned above.
.windows() returns AudioSignal objects, instead of arrays.
Transforms testing
One quick note about how transforms are tested. We test transforms just as smoke tests, to make sure they run, and if run twice with the same parameters, create the same audio file. When you create a Transform in data/transforms.py, a test will automatically be created for it. The regression data (an audio file) gets created in tests/regression/transforms/[transform_name].wav if it is not already there when you run:
python -m pytest -k test_transforms
If the regression audio file is there, then the test compares the output of the run transform with that of the regression audio.
This PR adds and extends a lot of functionality in AudioTools. The main feature is that of adding a transform pipeline, which can be used very easily to augment your audio signals, like so:
In addition to above, we've added more core functionality:
clone
: A new function for cloning a signal, rather than needing todeepcopy
all the time.detach
: Detaches all the constituent tensors for a signal from gradients.shape
: Get the shape more easily.New effects:
high_pass
: Added ahigh_pass
effect.apply_ir
: New function that does an entire pipeline of augmenting, and then applying an impulse response.mix
: Now takes inother_eq
(optionally), which applies.equalizer
to the other signal being mixed in.volume_change
: Boosts or cuts volume of a signal.clip_distortion
: Makes a signal clip.quantization
: Quantizes the samples of a signal.mulaw_quantization
: Same as above, but uses mu-law to quantize.Better indexing:
signal[key]
now returns anAudioSignal
object, instead of an array.signal[key] = other_signal_or array
now works. The associated parts ofsignal.audio_data
will be set.Data utilities:
AudioSignal
s.Discourse tools:
audio_table
that allow you to put non-AudioSignal objects as values in a dictionary, and they'll render appropriately.post.disp
, to specify things like the name of thefirst_column
.If
key
only refers to the batch index, then the_loudness
andstft_data
tensors also come along for the ride.Some bug fixes:
self._loudness
now gets moved to the appropriate device inside of.to
.self.stft_data
also get moved to the appropriate device.Some API changes:
.windows()
returnsAudioSignal
objects, instead of arrays.Transforms testing
One quick note about how transforms are tested. We test transforms just as smoke tests, to make sure they run, and if run twice with the same parameters, create the same audio file. When you create a Transform in
data/transforms.py
, a test will automatically be created for it. The regression data (an audio file) gets created intests/regression/transforms/[transform_name].wav
if it is not already there when you run:If the regression audio file is there, then the test compares the output of the run transform with that of the regression audio.