Closed KentoNishi closed 3 years ago
It's in a working and semi-usable state!
# TODO: WRITE SOME REAL TESTS
import torch
from torch_audiomentations import Compose, PitchShift
# Initialize augmentation callable
apply_augmentation = Compose(
transforms=[
PitchShift(16000, p=1),
]
)
torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Make an example tensor with white noise.
# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 s of 16 kHz audio.
import time
audio_samples = (
torch.rand(size=(8, 2, 32000), dtype=torch.float32, device=torch_device) - 0.5
)
# Apply augmentation. This varies the gain and polarity of (some of)
# the audio snippets in the batch independently.
start = time.process_time()
perturbed_audio_samples = apply_augmentation(audio_samples, sample_rate=16000)
print(time.process_time() - start)
1.53125
Note: code was run on gpu!
i haven't implemented any tests, I'll leave that up to you guys :)
plz let me know what you think of what I have so far!
Thanks for making this, and thank you for your patience. I'll have a look at this soon 👍
i tried using it in my own project, and it seems like there's a memory leak somewhere? vram usage keeps increasing when i include pitch shift. will investigate.
Interesting. The memory leak should ideally be fixed before we merge this :)
Thanks for the reviews, will take a look soon :)
Alright gonna get some sleep now, good night 😴
Adding Batched_Pitch_Shift
as a separate class might be a good idea. Currently working on batched shifting in the library itself :)
Implementation of batches transforms is done in the library! Will update the fork when I have time later.
timed in seconds. really liking how fast it runs! this is with 8 samples with sr=16000 and clips each 2 seconds long
@iver56 what's new:
per_batch
, per_example
, and per_channel
. per_batch
is default cuz it's fastThanks for the improvements 👍 I will re-review this soon-ish (my availability is a bit limited, thanks for your patience)
I would like "per_example" to be the default mode. Although "per_batch" is faster, variation within each batch is typically a good idea when training models :)
The other transforms have "per_example" as the default mode too
Something happened to the commits after the force push - I can't see the commits in the pull request anymore
I think would also prefer a bit more modest default parameters - pitch shifting a whole octave up or down is a bit extreme. In audiomentations the default is -4 to +4 semitones. -4 semitones is down a third of an octave, and +4 semitones is up a third of an octave. This default would give a range of two thirds of an octave.
In audiomentations, the pitch shifting parameters are input as semitones. Could that be relevant here too? I personally find it easier to relate to the numbers when they are given in semitones (e.g. -12 and +12) instead of fractions (e.g. 0.5 and 2.0)
@iver56 I think what happened is that the force push overwrote all my commits. I'll patch it up and open a new PR
This PR is still a work in progress, but here is the gist of it:
PitchShift
class basically just calls this library to transform each sample in the batch.Compose
process. I'm sure yall can figure something out, the current class is just a proof of concept.test.py
at the root of the repo, which I'll have to delete later.)The library I made is still undocumented, so I don't want to make it public just yet. If you want to verify that the code is not malicious, let me know so I can add you to the repo!