dbry / adpcm-xq

Xtreme Quality IMA-ADPCM Encoder / Decoder
BSD 3-Clause "New" or "Revised" License
204 stars 44 forks source link

Ingame audio: Bugs Bunny & Taz: Time Busters PC #21

Open rubinho146 opened 6 days ago

rubinho146 commented 6 days ago

Started to convert everything one by one but I happened to get some issues in converting the audio. For some reason the audio doesn't convert properly in some wav audios and they seem to lower the volume for some reason. I'm sending in the attachment samples so you can have a look at, original wav and ima conversion from the game and my wav and conversion. Also included is a txth file so you can listen to the audio on Foobar2000 with the vgmstream plugin.

The command I'm using is "adpcm-xq -e -r -b15 FILE.wav FILE_NEW.wav" Don't mind the AIFF header on the original file, the devs did that to trick the engine in some way or another. Let me know if this is possible to solve You can play my IMA converted wav and see that the audio is lowered and finishes with a small "tik". This happens with some other audios I have edited, but for some reason with the original it doesn't happen. It might be a dynamic thing on your tool?

Cheers.

Samples.zip

dbry commented 5 days ago

Okay, I’ve looked over all these files and done some experiments and have some new information from issue #19 and finally think I know exactly what’s going on. I’m using the command-line version of vgmstream because I don’t normally run Windows, but I think it works the same.

So, let’s start in the your “original” directory. There’s a file in there called “Granny.bze_000A 00007.ima” which, as you said, contains an AIFF header and then IMA audio. My guess is that header was put on there by whatever program extracted the IMA audio from the original game. The actual IMA starts at offset 0x52, however the txth file you included specifies an offset of 0x36, which is too small. If you look in the file after that point you will see the SSND chuck start and if you zoom into the decoded WAV file at 0.0015 seconds you can see a little shift there. That’s the “SSND” being decoded as IMA data (!) and is not supposed to be there. Change the offset to 0x52 and that will be fixed in the WAV file. Maybe this affects other files?

What I discovered in that other issue #19 is that some applications do IMA data differently than the way it’s used in WAV files. Specifically the standard IMA headers are gone and the IMA data is not divided into blocks (or frames). I’m not talking about WAV or AIFF headers here, I’m talking about the little 4-byte header for each block that provides the first sample and the initial index into the scalar table. It seems that in these cases the scalar is assumed to start at zero, the first sample is in the IMA data (not the header), and there are no blocks (it just goes on and on until the end of the audio). My tool does not create this format. The “raw” option only causes the tool to not put on a WAV header...it still generates the same IMA headers and blocks. When you specified -b15 you are setting the largest possible size (65529 sample) but if you tried a file longer than that (4.1 seconds at 16 kHz) you would still get to the end of the frame and get a glitch.

If you don’t specify the “raw” option for my tool and generate IMA in WAV, then vgmstream has no trouble playing the files correctly and even shows the frame size (on the command-line version). So vgmstream understands both formats, but I’m guessing that your target (a console?) wants these raw, unbroken streams.

Now, if there’s some silence at the beginning of the audio the header ends up being all zeros (like in this case) and so playing the file generated with my tool in “raw” mode using vgmstream works okay (although you should set the offset to 0x4 to skip just the initial header). Assuming you don’t go past 4 seconds, of course.

But what happens if the beginning of the audio is not silence? That brings us to your “edited” directory. The WAV file there is not silence at the beginning and the 4-byte header generated by my tool contains important information about decoding the file, including the initial scalar table index (which is why the volume is wrong!) Again, if you tell my tool to generate a WAV file then vgmstream has no problem because it understands those headers in WAV files, but not here. Even if you set the offset to 0x00 it still does not work right because vgmstream is not expecting any IMA headers in that mode.

BTW, as an aside, I see that you are re-encoding audio that has already been encoded with ADPCM. In your original directory it turns out that my encoder encoded the audio exactly like it was encoded the first time (even including the SSND chuck, which is not supposed to be there, is identical except the whole thing is shifted one nibble because of the extra sample that my tool puts in the header). This is good because the sample will sound identical. But this might not always work, especially if you perform some edits on the file like changing the volume or doing EQ. In that case, you will end up adding the noise of ADPCM encoding again, and you might hear some degradation. Using the high lookahead values of my tool should help, but it will never sound as good as the original.

But anyway, if you actually got this far, you might be asking, what is the solution? Assuming that your audio must be in this “headerless, frameless” format my tool is not going to work (unless you put a little silence at the beginning of every file and never go past 65529 samples). But I am thinking that I might be able to quickly cobble together a version of my tool that generates raw data in this format. It might not be able to decode it but I (and you) can use vgmstream to verify the output. Give me a day or two and I’ll let you know how that goes.

rubinho146 commented 5 days ago

Damn, I did a test here and is exactly what you said. It needs to be deadly silence (I muted the first miliseconds) and it converted like a charm. But this isn't viable to do sometimes as I have to use that begining sometimes to fit the voices in the time limit of the original one. If you can do a little "fix" on this regard, it would indeed help a lot on the process of converting. It's a lot of files (maybe more than 3000 audios) to convert. (About the AIFF, just ignore that header as it was used by the developers to trick the engine of the game to load the sound.)

Cheers and thanks for looking into it, will wait, Cheers!

dbry commented 3 days ago

Okay, I've created a new branch with the changes for you to try out. Basically I added an option -b0 to force headerless / blockless mode. This requires -r also to generate raw output, so no container. I have verified them using your vgmstream txth file with offset of 0x00. There is no way to convert them back to PCM using my tool because they're raw, but the -n option does actually decode them to calculate the quantization noise (kind of a double-check).

Stereo is also supported with an interleave of 0x04. Not sure if you're using that or whether you can set the interleave, but it's there.

The command would be something like:

> adpcm-xq-0.5x.exe -6 -n -b0 -r source.wav output.ima

I'm attaching a zipped Windows executable (assuming GitHub let's me do that).

adpcm-xq-0.5x.zip

Good luck!

rubinho146 commented 3 days ago

Hi, David! Been testing it out for a couple of audios, converted lots of it and is working super great. There are no more audio glitches now. This might work in more two games from the same developer. Will report back with news if I manage to convert them all. Big thanks! Cheers.

dbry commented 2 days ago

Glad to hear it, thanks for letting me know!