dhmit / sonification

15 stars 5 forks source link

Future Adaptations to the FX Module #9

Open Watermelanie opened 3 years ago

Watermelanie commented 3 years ago

The purpose of this module is to create filter functions to modify input sound waves for greater flexibility in the audio that is produced from user input.

Currently some basic filter functions that are implemented include apply_filter, get_notes, change_volume, stretch_audio, change_pitch, and add_chords.

Basic tests for most of these functions have been implemented in tests.py. There needs to be more comprehensive tests created.

joshfeli commented 3 years ago

Improvements to Our Filter Functions

Changing the Volume

Our current implementation of change_volume takes in a relative amplitude and multiplies the NumPy array by that amplitude. While this treats changes in loudness linearly, loudness is perceived logarithmically. We would need a new change_volume filter that takes this into account yet still accepts some "amplitude factor" that's intuitive for the user.

However, there's also the complication of audio clipping. These two Wikipedia articles describe the problem in greater detail, which may arise from the limited range of a 16-bit audio signal (the standard data type for the audio we produce in our analysis functions). How can we reconcile the user's desire to hear louder audio with the problem of clipping? Should we leave volume changes to the computer?

Adding chords

The add_chords filter currently builds major or minor triads on a sequence of notes, depending on the fundamental frequency of the note (i.e., if the pitch is higher or lower than F4). After create some test audio, however, we've found that add_chords introduces a lot of distortion to the audio signal. Is there some clipping involved that causes this distortion (see the above section for some links describing audio clipping)? Can we address the issues of changing the volume and adding chords simultaneously? Can we allow for a wider class of chords to be created? Power chords? Sevenths? Other jazzy alternatives? The possibilities are limitless!

Pitch Shifting and Time Scaling

The change_pitch and stretch_audio filters address two sides of the same coin. The fundamental pitches and duration of some audio signal are coupled by its sampling rate, but many filters (including ours) seek to isolate these qualities to filter them individually. This intro lecture presents the main problem and some approaches to a solution, and this Wikipedia article has some additional information. Currently, the stretch_audio function works by taking a note slice as input and repeating a certain number of copies of the note until the desired duration is reached. The change_pitch function calls stretch_audio as a result. While this approach works decently for pure sine waves, notes that have a more natural musical envelope would not fair as well. This could also lead to problems with our note detection functions, since laying down copies of the original note would also lay down extra copies of the note onset.

A more robust implementation of stretch_audio and change_pitch is needed to resolve the issues of note playback and onset detection. One solution might be to implement a phase vocoder or some other algorithm. Such a solution may still allow change_pitch to call stretch_audio or some equivalent time scaling function, depending on the method used.

Note Onset Detection

Our filter functions are structured to take an audio signal, slice it into sections that represent coincident notes, and concatenate the filtered notes to create a new audio signal. Thus, these filters rely on some note onset detection functions.

This paper presents a brief tutorial on note onset detection, where the section on spectral features of a signal makes use of the short-time Fourier transform (STFT). The STFT relies on Fourier Transforms, about which you can watch a general introduction here or read more in these free ebooks (the first seems to explain the math behind the DFT in great detail while the latter three seem to focus on audio applications). The Signal Processing Stack Exchange is also a great resource for learning more about the math and EE concepts behind the scenes!

The above paper mentions the possibilities of using the magnitude and/or phase of STFT coefficients during "transient" regions of an audio signal to detect note onsets. The _spectral_difference function, which get_notes uses, makes use of the magnitude of these coefficients to reduce an audio signal to one that highlights possible note onsets. We also tried implementing a _phase_deviation method that instead uses abrupt changes in the phase of an audio wave (the latter doesn't work too well). While the _spectral_difference function does a great job for the most part, it struggles to detect onsets for repeated notes (e.g., pressing the same note on a piano over and over again). This is likely because the same note will have very similar magnitudes for STFT coefficients from frame to frame, thus going largely undetected by our _spectral_difference function. However, the phase between two consecutive notes always changes, thus hinting that detecting changes in phase may be better, but (as the paper suggests), solely relying on phase could lead to problems with noise (for example, a function that makes use of phase makes little distinction between a piano note and a whisper in the background. One is more important in a recording than the other, but both have phase!). These two papers (which the first paper references as [25] and [26], respectively) discuss some approaches for combining information about the phase and magnitude of Fourier coefficients for a more robust note detection function. However, there is also the possibility of using probabilistic methods, which could be interesting to explore!

This third paper also presents some information on the subject with an excellent section on constructing peak-picking algorithms to act on the output of a note detection function (we get our dynamic threshold in _find_peaks from section 2.2).

Overall, we need a more robust note onset detection system to work with a wider range of input. Some of the tests in the app.tests.FiltersTestCase module fail and are commented out for this reason, but improving these filters allows for greater flexibility in manipulating newly created audio signals from other forms of data. Many of the above papers may present some unfamiliar concepts: this book and MIT's 6.003 may help with some of them!