dbry / audio-resampler

Simple audio resampler targeting embedded systems
BSD 3-Clause "New" or "Revised" License
33 stars 5 forks source link

Unsupported WAV format #2

Open sergeevabc opened 7 months ago

sergeevabc commented 7 months ago

Windows 7 x64, ART 0.2, SoX 14.4.2

$ sox --null --rate 48000 sin.wav synth 30 sin 6000 vol -10dB

$ mediainfo sin.wav
Format settings                          : WaveFormatExtensible
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 00000001-0000-0010-8000-00AA00389B71
Duration                                 : 30 s 0 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 536 kb/s
Channel(s)                               : 1 channel
Channel layout                           : C
Sampling rate                            : 48.0 kHz
Bit depth                                : 32 bits
Stream size                              : 5.49 MiB (100%)

$ art -v -4 -o16 -r44100 sin.wav art-4.wav
"sin.wav" is an unsupported .WAV format!

Maybe WaveFormatExtensible confuses ART?

$ sox --null --type wavpcm --rate 48000 sin.wav synth 30 sin 6000 vol -10dB

$ mediainfo sin.wav
Format settings                          : PcmWaveformat
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 30 s 0 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 536 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 48.0 kHz
Bit depth                                : 32 bits
Stream size                              : 5.49 MiB (100%)

$ art -v -4 -o16 -r44100 sin.wav art-4.wav
"sin.wav" is an unsupported .WAV format!

Maybe float will do?

$ sox --null --type wavpcm --encoding float --rate 48000 sin.wav synth 30 sin 6000 vol -10d

$ mediainfo sin.wav
Format settings                          : WaveFormatEx
Format                                   : PCM
Format profile                           : Float
Codec ID                                 : 3
Codec ID/Hint                            : IEEE
Duration                                 : 30 s 0 ms
Bit rate mode                            : Constant
Bit rate                                 : 3 072 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 48.0 kHz
Bit depth                                : 64 bits
Stream size                              : 11.0 MiB (100%)

$ art -v -4 -o16 -r44100 sin.wav art-4.wav
"sin.wav" is an unsupported .WAV format!

Hmmm. Float becomes 64-bit by default, let's trying lowering that to 32-bit.

$ sox --null --type wavpcm --encoding float --bits 32 --rate 48000 sin.wav synth 30 sin 6000 vol -10d

$ mediainfo sin.wav
Format settings                          : WaveFormatEx
Format                                   : PCM
Format profile                           : Float
Codec ID                                 : 3
Codec ID/Hint                            : IEEE
Duration                                 : 30 s 0 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 536 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 48.0 kHz
Bit depth                                : 32 bits
Stream size                              : 5.49 MiB (100%)

$ art -v -4 -o16 -r44100 sin.wav art-4.wav
format tag size = 18
FormatTag = 0x3, NumChannels = 1, BitsPerSample = 32
BlockAlign = 4, SampleRate = 48000, BytesPerSecond = 192000
cbSize = 0, ValidBitsPerSample = 16289
extra unknown chunk "fact" of 4 bytes
num samples = 1440000
resampling 1-channel file "sin.wav" (32b/48k) to "art-4.wav" (16b/44k)...
1024-tap sinc downsampler with lowpass at 21829.5 Hz
...completed successfully

At last!

Dear @dbry, why didn't ART accept the initial file? I created it using various online tutorials. And to create one that would work, I had to sweat.

dbry commented 7 months ago

Haha, sorry, you got unlucky there! It's not WaveFormatExtensible though, it's just the data formats.

The resampling tool uses 32-bit floats internally (it's targeted at embedded, real time applications) and I decided to only support formats that are losslessly converted to/from 32-bit floats. That includes everything reasonably common, but does exclude 32-bit integers and 64-bit anything. Those are uncommon (at least outside of SoX output, it seems) and not very useful (at least in my opinion). But more importantly, any advantage of those formats would be lost after converting with ART (because of the internal representation) so that's where the decision comes from. I suggest using another tool if those formats are important to you. Like maybe SoX... :smile:

sergeevabc commented 7 months ago

Hmmm.

$ mediainfo song.flac
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Duration                                 : 5 min 12 s
Bit rate mode                            : Variable
Bit rate                                 : 1 684 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 48.0 kHz
Bit depth                                : 24 bits
Compression mode                         : Lossless
Stream size                              : 62.7 MiB (100%)
Writing library                          : libFLAC 1.4.3 (2023-06-23)

$ flac -d song.flac

$ mediainfo song.wav
Format settings                          : WaveFormatExtensible
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 00000001-0000-0010-8000-00AA00389B71
Duration                                 : 5 min 12 s
Bit rate mode                            : Constant
Bit rate                                 : 2 304 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 48.0 kHz
Bit depth                                : 24 bits
Stream size                              : 85.8 MiB (100%)

$ art -4 -r44100 -o16 song.wav song.resampled.wav
                  ^
                 (digression: intuitively,
                  b is for bits, o is for output file,
                  but here b is for Blackman-Harris,
                  which is used by default anyway)

$ mediainfo song.resampled.wav
Format settings                          : WaveFormatExtensible
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 00000001-0000-0010-8000-00AA00389B71
Duration                                 : 5 min 12 s
Bit rate mode                            : Constant
Bit rate                                 : 1 411.2 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Bit depth                                : 16 bits
Stream size                              : 52.6 MiB (100%)

Dear @dbry, why the output is WaveFormatExtensible? In case of 44 kHz 16 bit I expect to get PcmWaveformat according to that document. Otherwise, I cannot compress this WAV file using Helix MP3 encoder, an alternative to LAME.

$ hmp3 song.resampled.wav -V150 -HF2 -U2
hmp3 MPEG Layer III audio encoder 5.2.1, 2022-12-19

<press any key to stop encoder>
PCM input file: song.resampled.wav
MPEG ouput file: song.resampled.mp3
pcm file:  channels = 2  bits = 16,  rate = 44100  type = 1
UNSUPPORTED PCM FILE TYPE

There is a dirty workaround: we can force FLAC to decompress 48 kHz 24 bit into PcmWaveformat in the first place using --force-legacy-wave-format flag, but actually it goes against the specification I linked above.

dbry commented 7 months ago

The McGill document does say that 24-bit WAV files should be WaveFormatExtensible, and other people repeat it, but I have not seen that in any Microsoft document and there's nothing in the WaveFormatExtensible structure that's required or useful for 24-bit files (and other than FLAC's warning, I've never seen a program refuse to take a 24-bit file audio because of the simple header). And the simple WAV header lists IEEE float data, so that certainly doesn't make sense if it can't go over 16-bits.

My preference has always been to write the simplest header possible (without ambiguities) to prevent the problem you're running into with Helix and lots of other software (my Cool Edit 2000 won't read or write WaveFormatExtensible and works fine with 24-bit audio, not to mention 20-bit and 32-bit float). The reason that ART writes an extensible header in this case is that the source file had one and specified a channel mask, so I'm not really following my own rule here.

Pushed commit fixing this and also something I just found where I wasn't padding odd-size data chunks (24-bit mono with an odd number of samples).

One thing I should add is that ART is really intended as a demo / test harness for the sample-rate converter. It's not really intended to be a general purpose sample-rate converter tool, and as such there are plenty of things missing (e.g., copying other RIFF chunks or tags, handling RF64 or BW64 files, etc). That said, if you find any other bugs please let me know.

Thanks!