This project generates wav files that are slightly incorrect

padenot commented 12 months ago

I'm a developer working on Firefox, I came across https://evuraan.info/evuraan/test/wav/audio.html and your reddit post.

The reason for this bug is that the wav files generated by this project are somewhat incorrect. Firefox used to accept them by chance, but after some changes I made we failed to decode them. I've written a patch to restore the previous behaviour (accept the invalid file and attempt to do something sensible with them), but I figured I'd open an issue here since this might well be an easy fix on your end. I've also written a test, so this won't regress in the future.

The problem is that the channel count and the channel mask in the WAVEFORMATEXTENSIBLE section of the file don't match. On https://evuraan.info/evuraan/test/wav/audio.html, we can see:

...
Channel(s)                               : 2 channels
Channel layout                           : L R C LFE Lb Rb Lscr Rscr Cb Ls Rs Tc Tfl Tfc Tfr Tbl Tbc Tbr
...
...
...
16le @ 0x558475825340] Channel layout '32 channels (FL+FR+FC+LFE+BL+BR+FLC+FRC+BC+SL+SR+TC+TFL+TFC+TFR+TBL+TBC+TBR+DL+DR+WL)' with 32 channels does not match specified number of channels 2: ignoring specified channel layout

mediainfo says that the file as two channels, but that the channel layout lists a lot of channels, but because the number of channels is two it should read L R.

ffmpeg clearly states that it's incorrect, but also explains what it does. This is also the approach that I implemented in Firefox: if the channel count and the channel mask disagree, disregard the channel mask, and instead use one of the "standard" mask: 2 channels map to stereo, 6 channels to 5.1, etc.

A reference for this format can be found here: https://learn.microsoft.com/en-us/windows/win32/api/mmreg/ns-mmreg-waveformatextensible. Quoting from this page:

The number of bits set in dwChannelMask should be the same as the number of channels specified in WAVEFORMATEX.nChannels.

Lastly, if you see anything looking (or sounding!) weird in Firefox, please don't hesitate to open a bug at https://bugzilla.mozilla.org/enter_bug.cgi?product=Core (you can log in with your GitHub account), putting padenot@mozilla.com in the CC field, if it has anything to do with audio or video in any capacity. Alternatively, you can reach me via email at padenot@mozilla.com.

evuraan commented 12 months ago

Wow, spot on! Thank you for taking the time to explain.

For perceptible hearing reasons, I prefer wav files - and for this project, since we are dealing with hdFM media, I also have to make the wav format "streamable". I accomplish this by changing the wav header. Although the resulting wav is slightly out of spec, it allows most browsers to "stream" the hdFm programs. (Code)

This is also documented elsewhere at https://stackoverflow.com/a/55239299.

The apparent side-effect is, it also alters the channel layout:

$ sox -n -r 44100 -c 2 silence.wav trim 0.0 5.0
$ mediainfo silence.wav |egrep "Channel"
Channel(s)                               : 2 channels
Channel layout                           : L R
$ ./alterWav silence.wav 
  modifying..
$ mediainfo silence.wav |egrep "Channel"
Channel(s)                               : 2 channels
Channel layout                           : L R C LFE Lb Rb Lscr Rscr Cb Ls Rs Tc Tfl Tfc Tfr Tbl Tbc Tbr

Thank you adopting what ffmpeg does, I am waiting for this update to trickle down to try it out.

Please let me know if you have any other feedback.

padenot commented 11 months ago

As far as I know, a streamable WAV has indeed a very large data chunk size, and implementations usually either display a very large duration, or take the hint that it's exactly 0xffffffff and don't display a duration. In the context of a Web Browser playing a file (vs. an infinite stream like a radio), or if the file is present on disk, we know the file size (via the Content-Length header, or just stat(...)ing), the sample type, the channel count, etc., and we're able to determine the exact duration anyway. Another use of this is for logging audio in a program, e.g. for debugging: write a WAV header with "infinite" duration, and then just append some interleaved audio buffer, don't even bother closing the file (or the program can even crash), most program will happily load it.

The real issue here is that we're in the presence of an WAVEFORMATEXTENSIBLE fmt sub-chunk and that the bytes that were overwritten were the channel layout (also 32-bits), not the size.

https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html has the layout of this struct (at the bottom of the page). If the format (named AudioFormat in your image) at offset 20 is 0xFFFE, then the "real" format is to be determined by a sub-format chunk present a few bytes after. This is the case in the file you provided as an example, and we also see that the sub-chunk size (byte 16) is 40 bytes, which is typical of WAVEFORMATEXTENSIBLE files. Another way to verify this is to load your file in https://ide.kaitai.io/, and choosing media -> wav in the folder tree on the left: this will break down a wav file in its components, allowing to easily see what's going on.

Regardless of all this, I want to make it clear that this was our fault and is a real regression in Firefox, this will be fixed in Firefox 121, released on the 19th of December this year. The fix will be present in Firefox Nightly, Developer Edition and Beta tomorrow, maybe even later today: https://www.mozilla.org/fr/firefox/channel/desktop/. Thanks for reporting it, we have community members and employees monitoring the /r/firefox sub-reddit, there's often interesting bug reports there.

evuraan / yellowShoes

This project generates wav files that are slightly incorrect #13