endolith / waveform-analysis

Functions and scripts for analyzing waveforms, primarily audio. This is currently somewhat disorganized and unfinished.
MIT License
259 stars 82 forks source link
audio audio-analysis frequency-estimation noise rms waveform

example workflow codecov

This was originally several different scripts for measuring or processing waveforms, which I pasted on gist:

Since they have a lot in common, I'm trying to combine them into one repository, so I'm not duplicating effort. It's a mess so far, though.

Please don't blindly trust this. If you use this and find a stupid error, please let me know. Also let me know if you use this and it works perfectly. :D

Installation

This should now be an installable package, using:

pip install git+https://github.com/endolith/waveform-analysis.git@master

I'm in the process of splitting it into callable functions and scripts that use those functions.

Waveform analyzer

Currently this displays file information and measurements like crest factor and noise (including A-weighted noise, which is not usually included by other tools).

Usage: python wave_analyzer.py "audio file.flac"

For Windows' SendTo menu: pythonw wave_analyzer_launcher.py

Requires:

Recommended:

Optional:

A-weighting

Applies an A-weighting filter to a sound file stored as a NumPy array.

I was previously using the FFT filter in Adobe Audition to simulate an A-weighting filter. This worked for large signal levels, but not for low noise floors, which is what I was stupidly using it for.

Frequency estimator

A few simple frequency estimation methods in Python

These are the methods that everyone recommends when someone asks about frequency estimation or pitch detection. (Such as here: Music - How do you analyse the fundamental frequency of a PCM or WAV sample)

None of them work well in all situations, these are "offline", not real-time, and I am sure there are much better methods "in the literature", but here is some code for the simple methods at least.

THD+N calculator

Measures the total harmonic distortion plus noise (THD+N) for a given input signal, by guessing the fundamental frequency (finding the peak in the FFT), and notching it out in the frequency domain.

Example of usage, with 997 Hz full-scale sine wave generated by Adobe Audition at 96 kHz, showing the 16-bit quantization distortion:

> python thd_analyzer.py "../997 Hz from 32-bit to 16-bit no dither.wav"
Analyzing "../997 Hz from 32-bit to 16-bit no dither.wav"...
Frequency: 996.999998 Hz
THD+N:      0.0012% or -98.1 dB
A-weighted: 0.0006% or -104.1 dB(A)

(Theoretical SNR of a full-scale sine is 1.761+6.02⋅16 = −98.09 dB, so this seems right)

According to the never-wrong Wikipedia:

I'm not sure how much of the surrounding region of the peak to throw away. Should the "skirt" around the fundamental be considered part of the peak or part of the noise (jitter)? Probably the way to match other test equipment is to just calculate the width of a certain bandwidth, but is that really ideal?

Previously was finding the nearest local minima and throwing away everything between them. It works for some cases, but on peaks with wider "skirts", it gets stuck at any notches. Now using a fixed width ±10% of the fundamental, which works better.

For comparison, Audio Precision manual states:

Bandreject Amplitude Function Bandreject Response typically –3 dB at 0.725 fo & 1.38 fo –20 dB at fo ±10% –40 dB at fo ±2.5%

So this is Q factor 1.53 or 0.93 octaves? 2nd-order?

Adobe Audition with default dither, full-scale 997 Hz sine wave:

 8-bit    -49.8 dB
16-bit    -93.3 dB
32-bit   -153.8 dB

Art Ludwig's Sound Files:

File                        Claimed  Measured  (dB)
Reference           (440Hz) 0.0%     0.0022%   -93.3
Single-ended triode (440SE) 5.0%     5.06%     -25.9
Solid state         (440SS) 0.5%     0.51%     -45.8

Comparing a test device on an Audio Precision System One 22 kHz filtered vs recorded into my 96 kHz 24-bit sound card and measured with this script:

Frequency   AP THD+N    Script THD+N
40          1.00%       1.04%
100         0.15%       0.19%
100         0.15%       0.14%
140         0.15%       0.17%
440         0.056%      0.057%
961         0.062%      0.067%
1021        0.080%      0.082%
1440        0.042%      0.041%
1483        0.15%       0.15%
4440        0.048%      0.046%
9974        7.1%        7.8%
10036       0.051%      0.068%
10723       8.2%        9.3%
13640       12.2%       16.8%
19998       20.2%       56.3%  (nasty intermodulation distortion)
20044       0.22%       0.30%

(This was done with local minima method. Redo with 10% method.)

So it's mostly accurate. Mostly.

To do

Done