depp / skelly64

Tools for creating Nintendo 64 games
Mozilla Public License 2.0
13 stars 0 forks source link

VADPCM tracking #2

Closed depp closed 2 years ago

depp commented 3 years ago

In order to implement a VADPCM encoder / decoder, first need to understand how VADPCM works. Find reference materials or create them.

depp commented 2 years ago

Writing progress in the the VADPCM thread in the N64brew Discord right now. Putting the important stuff here:

Basic project plan:

  1. Write VADPCM decoder I can run on PC (done),
  2. Write VADPCM encoder, which can use an existing codebook,
  3. Write codebook generator.

Codebook generation should be part of encoding. I see no likely use case for sharing codebooks between files (maybe there is a use case, but it just seems like such a weird thing to do).

Basic sketch of how codebook generator could work:

  1. (optional) Apply frequency weighting to signal (psychoacoustics, e.g. A-weighting or similar),
  2. Go block by block,
  3. Calculate autocorrelation matrix for each block (including the previous two samples),
  4. Use autocorrelation to calculate predictor coefficients,
  5. Calculate a weight for the block based on signal level,
  6. Use weighted k-means to create a codebook of predictor coefficients (or some other vector quantization technique).

These are all, dare I say, tractable. As part of this effort, I've written a VADPCM decoder, which got committed in 3784e5e7ee772ef44a483a045a0e1e5987aae099. It works.

The decoder program itself is written in C++ and I may rewrite it in Go using CGo because C++ is just kind of archaic.

depp commented 2 years ago

Working on the encoder. Wrote some simple code to calculate the autocorrelation matrixes, and wrote something to calculate predictor coefficients.

The current predictor coefficient calculator works by calculating the eigendecomposition of the autocorrelation matrix, and taking the coefficients from an eigenvector with the smallest eigenvalue. Now that I write this out, I realize that this may not be correct, so I'm going to recheck the math.

Rather than just using k-means on the result, I think it makes more sense to assign frames to clusters that have the best predictor coefficients, and then recalculate the coefficients from the autocorrelation matrix of all frames in the cluster.

depp commented 2 years ago

Renamed this issue because I'm using it to track overall VADPCM implementation process.

The first draft of the core VADPCM encoder is written. I intend to break this into smaller commits and merge them with appropriate tests.

It turns out the correct way to solve for predictor coefficients is a bit simpler and doesn't require eigendecomposition, it just requires solving a small linear system of equations. This solver has been written, and should probably get its own tests because it is relatively self-contained.

The codebook generation works by assigning each frame to a predictor, and then iteratively calculating predictor coefficients and reassigning each frame to the best predictor for that frame. I'll want to collect some statistics from this process to see if it's behaving the way I expect. I chose a deterministic algorithm for assigning frames to predictors, and may revisit this.

The last piece of the puzzle is the actual encoder. This is relatively straightforward. If we take an encoded file and reuse the codebook and predictor assignments, re-encoding the file should result in the same exact audio... although I'm not prepared to guarantee that.

depp commented 2 years ago

The encoder appears to work, but it produces very low-quality output. Don't use it.

First, going to set up a testing harness so I can iterate on the encoder more quickly. The test harness should make it easier to see the results of clustering (assess how even the clustering is), show SNR, and make it easy to hear decoded output. I've assembled a small corpus of music to test with.

Second, there are a couple places where I can look for improvement:

depp commented 2 years ago

First, found the bug. Before fixing it, added the test harness so I could see the stats for the SNR improvement.

The average SNR went from about 10 dB to 17 dB. Not great, but a very noticeable improvement.

depp commented 2 years ago

It turns out various bugs were holding back the sound quality, and with those bugs fixed, the encoded audio sounds pretty good.

I'll close this issue once the VADPCM encoder/decoder is documented.

Any additional improvements can be done with separate issues -- see #4 #5 #6 #7 #8 #9.

depp commented 2 years ago

Pushed VADPCM docs in 8858fef26a39fa34857f818ef3d56ad3affdbd06. This completes basic VADPCM support.