Closed padenot closed 11 months ago
Wow, spot on! Thank you for taking the time to explain.
For perceptible hearing reasons, I prefer wav files - and for this project, since we are dealing with hdFM media, I also have to make the wav format "streamable". I accomplish this by changing the wav header. Although the resulting wav is slightly out of spec, it allows most browsers to "stream" the hdFm programs. (Code)
This is also documented elsewhere at https://stackoverflow.com/a/55239299.
The apparent side-effect is, it also alters the channel layout:
$ sox -n -r 44100 -c 2 silence.wav trim 0.0 5.0
$ mediainfo silence.wav |egrep "Channel"
Channel(s) : 2 channels
Channel layout : L R
$ ./alterWav silence.wav
modifying..
$ mediainfo silence.wav |egrep "Channel"
Channel(s) : 2 channels
Channel layout : L R C LFE Lb Rb Lscr Rscr Cb Ls Rs Tc Tfl Tfc Tfr Tbl Tbc Tbr
Thank you adopting what ffmpeg
does, I am waiting for this update to trickle down to try it out.
Please let me know if you have any other feedback.
As far as I know, a streamable WAV has indeed a very large data chunk size, and implementations usually either display a very large duration, or take the hint that it's exactly 0xffffffff
and don't display a duration. In the context of a Web Browser playing a file (vs. an infinite stream like a radio), or if the file is present on disk, we know the file size (via the Content-Length
header, or just stat(...)
ing), the sample type, the channel count, etc., and we're able to determine the exact duration anyway. Another use of this is for logging audio in a program, e.g. for debugging: write a WAV header with "infinite" duration, and then just append some interleaved audio buffer, don't even bother closing the file (or the program can even crash), most program will happily load it.
The real issue here is that we're in the presence of an WAVEFORMATEXTENSIBLE
fmt
sub-chunk and that the bytes that were overwritten were the channel layout (also 32-bits), not the size.
https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html has the layout of this struct (at the bottom of the page). If the format (named AudioFormat
in your image) at offset 20 is 0xFFFE
, then the "real" format is to be determined by a sub-format chunk present a few bytes after. This is the case in the file you provided as an example, and we also see that the sub-chunk size (byte 16) is 40 bytes, which is typical of WAVEFORMATEXTENSIBLE
files. Another way to verify this is to load your file in https://ide.kaitai.io/, and choosing media -> wav
in the folder tree on the left: this will break down a wav file in its components, allowing to easily see what's going on.
Regardless of all this, I want to make it clear that this was our fault and is a real regression in Firefox, this will be fixed in Firefox 121, released on the 19th of December this year. The fix will be present in Firefox Nightly, Developer Edition and Beta tomorrow, maybe even later today: https://www.mozilla.org/fr/firefox/channel/desktop/. Thanks for reporting it, we have community members and employees monitoring the /r/firefox
sub-reddit, there's often interesting bug reports there.
I'm a developer working on Firefox, I came across https://evuraan.info/evuraan/test/wav/audio.html and your reddit post.
The reason for this bug is that the wav files generated by this project are somewhat incorrect. Firefox used to accept them by chance, but after some changes I made we failed to decode them. I've written a patch to restore the previous behaviour (accept the invalid file and attempt to do something sensible with them), but I figured I'd open an issue here since this might well be an easy fix on your end. I've also written a test, so this won't regress in the future.
The problem is that the channel count and the channel mask in the
WAVEFORMATEXTENSIBLE
section of the file don't match. On https://evuraan.info/evuraan/test/wav/audio.html, we can see:mediainfo
says that the file as two channels, but that the channel layout lists a lot of channels, but because the number of channels is two it should readL R
.ffmpeg
clearly states that it's incorrect, but also explains what it does. This is also the approach that I implemented in Firefox: if the channel count and the channel mask disagree, disregard the channel mask, and instead use one of the "standard" mask: 2 channels map to stereo, 6 channels to 5.1, etc.A reference for this format can be found here: https://learn.microsoft.com/en-us/windows/win32/api/mmreg/ns-mmreg-waveformatextensible. Quoting from this page:
Lastly, if you see anything looking (or sounding!) weird in Firefox, please don't hesitate to open a bug at https://bugzilla.mozilla.org/enter_bug.cgi?product=Core (you can log in with your GitHub account), putting
padenot@mozilla.com
in theCC
field, if it has anything to do with audio or video in any capacity. Alternatively, you can reach me via email at padenot@mozilla.com.