Speech MORT format codec parsing/decoding

JackCarterSmith commented 1 year ago

Hi,

Okay everybody, it's a big one! Speech in RS are recorded in propritary codec bearing the name of "MORT", MORT can be found in header of files inside "speech" file inside data.dat. Used in a lot of Factor5's games, MORT audio codec does not seem to be fully decoded for now...

I haven't found anything more accomplished than a reversed toolset of N64 version (https://github.com/jombo23/N64-Tools). Despite efforts to recompile myself the tool, wav output seem broken...

IDA/Ghidra doesn't help me further than speech file RAM allocation and some pointer/config datas. I don't think the datas are "complex" to decode, but the decoder functions seem fragmented in the main program... Pretty hard to extract.

I'll keep trace of my "quest" in this issue, reverse engineering of RS doesn't have a lot of topic in the www universe x)

JackCarterSmith commented 1 year ago

By adapting some of the @SubDrag code to make it a standalone testing tool, I was able to convert the MORT subfile into an "horrible" sound...

However, the general appearance of the audio signal seems to me not to be noise (not "random" enough). By stretching the sound as it went along, it was not too difficult to recognize a recognizable cry. I wasn't that far off when I said the sound was ugly!

I'm looking for adjustement in wav generation and looking further in C-asm code to better understand MORT compression protocol.

BE <- be aware of this rare BE file inside other LE type!

Header | Track list | Tracks datas

Header (4B):
unsigned int [4B]: count of tracks

Track list (count of tracks x4B)
repeat_for( count of tracks ) {
    byte [1B]: flags - only the first bit is used, switch between two sound speed factor (0: 0.2 / 1: 0.35)
    unsigned int [3B]: track offset from beginning of file
}

Tracks datas (filesize - Header size - Track list size xB):
MORT compressed data [xB]: audio datas compressed using Factor5's audio codec.

beepbeeporsomething.zip

dpethes commented 1 year ago

I don't have much notes on the speech compression. It was part of the Musyx package and iirc there was a special microcode by F5 for N64's audio chip that handled sound and maybe MORT decoding as well? Anyway, great effort!

JackCarterSmith commented 1 year ago

Indeed, on the N64 version I could read that there was a microcode injected by the game cartridge. For those I know about the architecture of the N64, there is indeed the GPU which contains a programmable part for signal processing (like the shaders on our graphic cards today).

From what I could see and hear about the Factor5's ways, I would be inclined to think that it is the multi-channel support of the MusyX engine, allowing the superposition of SFX, soundtracks and voices of the game with a lower memory/computing power. Just a hypothesis but one that seems to me to be in agreement with those I have observed in the PC code, interviews and remarks on various forums about Factor5's games.

Happy to share! I'm not a good Pascal programmer :P

JackCarterSmith commented 1 year ago

Little update:

The "bit flag" in header correspond to samplerate (0=>8000 Hz and 1=>16000 Hz). It's correspond to value obtained during dev interview 👍 I've adapted my debug tool to use it as arguments: first argument is the MORT file manually extracted from speech file and the second is 1 or 0 in function of the flag.

I didn't tried to test it on linux system but I think it can work well... MORTDecoder.zip

dpethes commented 1 year ago

Thanks for the code. I might give it a try and clean it up somewhat and put it in repo afterwards.

JackCarterSmith commented 1 year ago

Thanks for the code. I might give it a try and clean it up somewhat and put it in repo afterwards.

I've just added main part to handle input and to drive the "big" MORT decoder. The bulk of the clean up should be here. I'm not certain about the 4096 empty offset before the data in input of the decoder, it doesn't work without it.

SubDrag commented 1 year ago

So the N64 Sound Tool can rip audio and speech from all MORT games. Rip is a strong word, it's kind of running through disassembled (at low level) code translated to C++ almost directly to spit out the output.

It would be nice if someone cleaned this all up and made an encoder though, instead of the sort of hacky way it's done.

Now anyways, what progress have you made here, or what are you trying to accomplish that's different? Other than a couple imperfect sounds here and there, and sampling rate being off due to games all having their own unique methods, 100% of N64 Sound can be ripped; I have no known misses.

JackCarterSmith commented 1 year ago

We are trying to reverse RS datas format on the PC version (even if N64 and PC code are similar: some of N64 functions/elements are present on PC).

As you said, the parser act like a "blackbox" with datas in input and retrieve stream from output. I wished to be able to clean it up to get a more human readable code and writting a note on the MORT codec algorithm. Perhaps, we can make a MORT encoder after that...

For now, I can extract all the quotes/speechs from PC datas with "speech" file in input. I've continued to search for MORT parser in PC code but no result. Maybe I should try with signature-sniffer approach...

SubDrag commented 1 year ago

Sounds good, good luck. I did this before Ghidra - using Ghidra might not be a bad idea for it to make the code a little cleaner, once you find it on PC - looks like Ghidra doesn't map that well to the pure ASM rip here.

It's a complex algorithm, with multiple stages, and something that happened long ago can impact something much later, and it's streaming. It also seems to grab different chunks/amounts of sound per framerate. It sure would be interesting to see this thing decoded for real and how it works.

If you're parsing this on PC version, MORT decoder must exist...

Maybe try and find these tables: table8004867C[0x00] = 0xC7; table8004867C[0x01] = 0xD7; table8004867C[0x02] = 0xE3; table8004867C[0x03] = 0xE7; table8004867C[0x04] = 0xF1; table8004867C[0x05] = 0xF3; table8004867C[0x06] = 0xF5; table8004867C[0x07] = 0xF7;

Also are constants in Function80048B3C: 0x2B33 0x4E66 0x6600

SubDrag commented 1 year ago

See this commit to fix the starting too early sound issue: https://github.com/jombo23/N64-Tools/commit/4fe107327f8a602097bed6ab57374612dc4d707a

If you can make appropriate adjustments to fix issue. I see both sounds exactly matching in N64 that you posted above for MORT, and they are shown as 8000/16000 properly in N64 Sound Tool. They are at MORT header +0x6. So for example: 4D4F525404E43E80 0x3E80 is 16000 in the sample you showed.

So are you ripping it properly now, or it's just remotely garbage but sounds like a real sound? The MORT samples you posted above match exactly to N64, so should you should be able to rip these perfectly. matchingn64rip.zip Here is their rip from N64 - yours should byte for byte match wav?

MORT is fully software decoding btw - not using microcode on N64. And FYI the ASM rip is from Pokemon Stadium US 1.0 on N64 (addresses match that). I assume you are trying to find the algorithm on PC to use ghidra/decompile though.

JackCarterSmith commented 1 year ago

It's already a great work, knowing that it is the only usable code for the MORT codec that I could find until now!

That's the point! As I know Factor5's methods, it's largely optimization oriented, I'll not be surprised to found some "tricks" with datas like "pattern reuse" or other such things. But thanks for the tips about the constants values of MORT decoder, I can use them to locate the portion of code who process it! I'll keep trace of my progress in this topic, but it take time to do so.

Yes I've got a clean sound after setting samplerate to 8000 or 16000 in function of specific header in PC version. NiceRogue.zip I've just "index" number of the tracks, the address are different between N64 and PC. But you got 16000 tracks in N64 version? I compared the files and it's perfectly the same 💯 (except for the last "smpl" I've volontary truncated as it's useless, I suppose it's use by the engine as generic property? I don't remember if it's same for all tracks...).

I was wondering on N64, it would have been possible/useful but yes on PC, the decoder is necessarily present if there are MORT files in the game data. I hope I can find it soon to get a base of comparison with N64 asm extracted instructions.

EDIT: I've located the 0x2B33, 0x4E66, 0x6600 parameters with Ghidra at function offset 0x5bce13. I see similarities in the structure of the function calls with the ripped N64 version.

SubDrag commented 1 year ago

So the actual N64 game outputs raw 16-bit sound data. My toolchain spits out wav files, so that's a wav chunk. smpl is used for loops. Though this game doesn't have loops in there as far as I know, so it's all nulls, and not useful. OK good luck! Note that Ghidra kind of...lumps functions together, so it's not a trivial 1:1 mapping. Especially if you're comparing x86_64, but anyways, you have reference output, which should help hopefully. Good luck! I really would love to understand the algorithm, and have an encoder, but no small feat.

JackCarterSmith commented 1 year ago

Loop instruction? Interesting... That's the most complex part of the RS reverse engineering process, no doubt. But, as you say: the most awesome part! A lot of time to extract, test and compare isolated code, the MORT encoder should be the conclusion of this "quest". May the force be with us!

JackCarterSmith commented 1 year ago

I've finished my first pass on the main class. Removed a lot of redondant variables and added loop when necessary.

It can always process rogue data correctly, I don't know if it's can always work with other N64 games @SubDrag.

Some parts always seems unclear... Perhaps I should try to clean it up more once again. MORTDecoder.zip

SubDrag commented 1 year ago

Yeah it works on all MORT games on N64. If you get something pretty high level would be interesting to see, but it's definitely a long shot to support an encoder, but maybe possible if you spend enough effort!

JackCarterSmith commented 1 year ago

Yeah it works on all MORT games on N64. If you get something pretty high level would be interesting to see, but it's definitely a long shot to support an encoder, but maybe possible if you spend enough effort!

Ah? Did you try with the new "middle-level" sources I've posted with my answer? Yeah, encoder is more a bonus challenge for me, the big one should be to clearly understand MORT encoding through decoder analysis! And I'm a very big fan of F5's works. Old tech certainly, but it's a good XP ref.

SubDrag commented 1 year ago

I didn't try it, but presumably you tested if it matches the output; if so it's a valid update.

SubDrag commented 3 days ago

There's a very poor importer now too. The algorithm is unknown, so brute-forcing is the best bet right now. The full brute force computers are not fast enough sadly. A partial brute-force gets iffy results, but at least, it's something that can import sound of some fashion.

JackCarterSmith commented 3 days ago

@SubDrag Are you talking about an importer/decoder other than the one you decompiled from the N64 version? I spent a good while trying to establish a mathematical model of decoder... Without going so far as to say that I'm at a standstill, the lack of advanced notions in the field of audio compression makes expertise long and "painful?" :D At the time of writing, the general structure seems very similar to an LPC encoding, but greatly obfuscated by fixed-point float implementation...

SubDrag commented 3 days ago

There was no importer before - I just recently added a brute forced version in an attempt to get something importing. I never made any progress either in understanding their advanced algorithm, so this was best could do.
https://github.com/jombo23/N64-Tools/commit/6d51dcb5b3cf3f23dd24107fb214b0e698a86860 That commit cleaned up some of the bit-reading assembly code to be more human readable as well.

JackCarterSmith commented 3 days ago

Ah yes, I remember now how I've used it 2 years ago! You've done a good job of rewriting much of the pseudo assembler code too!

Yeah the "bitstream" part of the decoder is a big mess, probably Huffman-style compression. I have made a up a graph of every "know samples" juste after the bits sampling, look like the references curves... Using redundancy on it and probably apply a filter (DSP) after on, and you should have a synthetized human voice.

That's a lot of speculation from the code and what I know of signal processing. It's only a matter of time before figuring out!

dpethes / rerogue

Speech MORT format codec parsing/decoding #13