ad044 / LainTSX

WebGL implementation of the Serial Experiments Lain PSX game
GNU General Public License v3.0
479 stars 25 forks source link

Better audio conversion #47

Open Number969 opened 9 months ago

Number969 commented 9 months ago

Hello!

I couldn't help but notice that the audio quality for the nodes isn't the same as on the playstation. As it was stated before in a pull request, using ffmpeg to convert the files applies a low pass filter to remove aliasing. You could tell ffmpeg to remove the filter but that ignores why the filter is there in the first place and creates artifacts.

By resampling the audio first to 44.1khz with something like r8brain and then converting with ffmpeg, the audio is correctly dithered, no (audible) artifacts are introduced and the frequency spectrum is preserved.

Here is how Cou001 sounds currently:

https://github.com/ad044/LainTSX/assets/152321101/3adfc8e9-79e3-41d9-b7b9-442d94c7e079

Here is the same node with my proposed approach:

https://github.com/ad044/LainTSX/assets/152321101/0f3e9172-b8f4-4827-82ad-a747d1b37da0

Not only that, but the files are almost the same size, with all of the audio only nodes in the game currently being 166MB total while mine are 170MB.

I'd be willing to send the files if you're interested, considering I already converted them. If not, follow this approach to get the same results.

Cheers!

Number969 commented 9 months ago

A good alternative would be using ffmpeg with the sox resampler library. While not as exact as r8brain, it produces better results than the default ffmpeg resampler.

So using:

ffmpeg -i LAIN01.XA[0].wav -af aresample=resampler=soxr -ar 44100 output.wav

Should give decently accurate results.

spaztron64 commented 9 months ago

Indeed, as the PR notes, ffmpeg currently downsamples the input to a lower sampling rate, and leaves the output to be interpolated by the client's output device itself. As the PS1's output frequency is 44.1kHz, and so is the output frequency for most consumer-grade computers out-of-the-box today, ffmpeg should be configured to output it's converted data at that sampling rate to prevent resampling inconsistencies.

As for which resampling algorithm should be used, this is a tricky one. Audio output accuracy is something that we've discussed during development, but couldn't come up with an ideal solution that didn't involve emulating the PS1 SPU outright, or at least implementing it's Gaussian interpolation ourselves. FFmpeg and all other general purpose audio libraries don't provide any kind of Gaussian interp implementations that are close to or match the output of the PS1 SPU.

Of course, if accuracy isn't the goal, but rather the highest output quality possible, then r8brain is the way to go.