Open trapexit opened 2 months ago
I've been asked this before and what happens is I Google DVI4 and never really get enough information to make an informed reply.
It looks like DVI4 is a variation of the IMA ADPCM format with 3 simple differences. That part I understand. But the DVI4 data does not come in files (like a WAV file) but only seems to exist in RTP packets. I also understand that, in principle, but don't understand how that would apply to my program (which obviously operates on files only).
How would this work? Are there sample files of DVI4 data that I could get and potentially decode? Or does this data really only exist in packets, and if so how does information like sample rate and channel configuration get transmitted?
Thanks for the question; I'd definitely like to help if I can!
I'm not expert in this space and it seems to me that there is a lot of confusion over naming and slight variations to codecs so maybe I'm wrong... but I've only ever seen it in AIFC files. According to what I see in 3DO documentation, tooling, and source code it refers to it as standard "Intel/DVI" and stored in AIFC.
I've attached an example. A 22050Hz AIFF, the Intel/DVI compressed version from a classic Mac app called Soundhack in AIFC, and the same but raw. If you need anything more let me know.
References:
The 3DO primarily used two compressed audio formats. One is Intel/DVI ADPCM and the other SDX2 developed by Phil Burk. ffmpeg can decode both but doesn't have encoders. It does have a adpcm_ima_ws encoder which is clearly similar but not identical to Intel/DVI (3DO can play it but has distortion.) My plan has been to write a bespoke encoder tool for the two and then try to upstream them into ffmpeg. I've not played with your encoder enough to know if it would be a material uplift in quality but currently use of Intel/DVI is limited due to the generally poor quality so what you mention in your docs is appealing given how memory constrained some titles can be (we have 2MB DRAM + 1MB VRAM.)
Thanks for your time.
Thanks for the example files! I did a little experimenting with them and found the following:
FFmpeg tried to decode jax.22050.intel_dvi.aifc
but it’s really distorted. It incorrectly identifies it as “adpcm_ima_ws” (like you suggest). FFmpeg seems to not support ADP4 for encode or decode.
I tried aifc2wav which says it can’t handle ADP4 but only IMA4 (which, interestingly, is what my program handles, except in the WAV container).
vgmstream decodes it fine.
I modified my program to swap the nibbles for encode and decode and then wrote a quick test to load the raw file jax.22050.intel_dvi.raw
you sent and send it to my decoder. This generated the exact same output as vgmstream except for the last few samples (which I understand), so that’s a start!
There are a couple things I don’t understand though. First, the raw file you sent had no ADPCM headers, not even one at the beginning, and a quick look at the vgmstream code made it look like ADP4 doesn’t use headers. That would mean that these files would not be seekable unless you decode from the very beginning. Is that true?
Also, it’s not obvious how stereo would work. Are there stereo ADP4 files?
Modifying my program to work with ADP4 data seems like it’s pretty straightforward. Just swapping the nibbles is a significant part of it, and that’s done. The bigger task would be switching from the WAV to AIFC container, because I assume that’s required by the hardware or the SDK, right? And figuring out how the headers work and how stereo is encoded (if it is).
I have a little time to look into this and maybe cobble together something crude to try. Do have any longer samples (like 10-15 seconds) and any stereo samples that you could provide?
re number of channels: I can really only speak to what I see in the 3DO space and there we only have support for mono in 44100Hz and 22050Hz. You can see some details here.
I assume that’s required by the hardware or the SDK, right?
Actually it isn't. It would be nice to have both but the SDK is pretty flexible and you can give it a raw buffer and tell it what it is and it'll be happy. If you use aiff or aifc it will figure it out automagically.
I have a little time to look into this and maybe cobble together something crude to try. Do have any longer samples (like 10-15 seconds) and any stereo samples that you could provide?
Sure, I can create a couple longer examples.
Thanks again. Will be interesting to compare. Right now we have a dev who is working on a homebrew port of Mortal Kombat 2 and just really started messing with audio. He's tight on RAM so the 4:1 compression is appealing vs 2:1 SQXD. And since it's for sound effects stereo isn't necessary. I'll upload the examples in a bit.
Thanks for the samples! I will take a look at these soon and see what I can figure out. I might be able to quickly prototype something that you guys could try, especially if raw data is okay for now. If it turns out that the quality improvement is worth it, then I can add AIFC support.
Much appreciated. I picked music since I figured it would be easier to hear subtle differences.
Okay, I've created a new branch with the changes for you to try out. Basically I added an option -b0
to force headerless / blockless mode and -i
to enable the swapped-nibbles of the Intel DVI4 variant. These require -r also to generate raw output, so no container. I have verified them using vgmstream specifying codec DVI_IMA and offset of 0x00. There is no way to convert them back to PCM using my tool because they're raw, but the -n option does actually decode them to calculate the quantization noise (kind of a double-check).
The command would be something like:
> adpcm-xq-0.5x.exe -6 -n -b0 -i -r source.wav output.ima
This otherwise works exactly like my standard tool so I didn't do any quality tests, and my opinion wouldn't really matter anyway because it's all based on how it works in your application. I'm attaching a zipped Windows executable (assuming GitHub let's me do that).
Good luck!
Thanks. We'll take a look.
@trapexit I have finished up a preliminary release of a DPCM encoder and checked it in here with a Windows binary. It turns out that with lookahead I was able to reduce the quantization noise by about 3 dB at most frequencies (1 dB at the very high end) so that's right on the threshold of worth-it. And of course the noise-shaping helps too. My guess is that older encoders did not do any of this (although I could be wrong, of course, since I didn't look).
Since the container support for DPCM is so bad I just set up everything as raw for now, but we can talk about that if it turns out to be a big problem (or you can just strip out the guts and put it into something else).
Please let me know if this is useful or you have issues or questions. Thanks!
Awesome. Will take a look. Thanks so much for your help. Really appreciated.
I've been working on a bespoke tool for our purposes with traditional, reference encoding for both adp4 and sdx2. I'll probably use that code as a basis to port to ffmpeg. Once that is done I'd like to incorporate your work as an alternative and keep the other encoders as a reference point of comparison if nothing else.
Awesome. Will take a look. Thanks so much for your help. Really appreciated.
I've been working on a bespoke tool for our purposes with traditional, reference encoding for both adp4 and sdx2. I'll probably use that code as a basis to port to ffmpeg. Once that is done I'd like to incorporate your work as an alternative and keep the other encoders as a reference point of comparison if nothing else.
I'm glad to help; I hope this ends up being useful!
Of course, let me know if you run into trouble or have questions.
Just pushed a new release with an over 10x speedup (!) and a crashing bug fix.
I've played a bit with your encoder and unfortunately myself and a couple others couldn't really tell a difference between yours and the reference encoder algo. That said we need to test it with some more samples to see if maybe it was just that one and also test on original hardware to see how that sounds vs listening on a PC. Will likely look to incorporate it into my tooling regardless just in case.
Are you talking about the adp4 or sdx2 codecs? With some samples with difference can be subtle, and the measured difference is not as big with sdx2, but people definitely hear improvement with some samples. I recommend at least level 6 for the quality (or higher if you have the patience) and try it with and without the noise shaping (-f turns it off). If you specify -0 and -f you should get results identical to the reference encoder. and you can use -n to display the noise and get a good idea how much improvement you're getting.
Anyway, that's for trying it out!
ADP4. Haven't gotten to SDX2 comparisons yet since the current homebrew being developed is just using ADP4.
Thanks for the suggestions. Will take a look.
Any chance of Intel/DVI4 support? We in the 3DO community are building new tooling for media conversion and the possibility of having a higher quality encoder for 4bit adpcm is appealing.
Thanks for your time.