LazyDuchess / OpenTS2

Open source re-implementation of The Sims 2 in Unity
Mozilla Public License 2.0
221 stars 17 forks source link

Figure out SPX (Speech) Audio format #2

Closed LazyDuchess closed 2 weeks ago

LazyDuchess commented 1 year ago

I have no experience figuring out audio formats, so this is something I will need help with. In the TSData/Res/Sound folder there are Voice1, Voice2, etc. packages. Inside of them there are audio files that begin with a SPX1 magic number - these are the audio files left to figure out.

ammaraskar commented 1 year ago

Aah these look to be Speex https://www.speex.org/ files with a custom header instead of vorbis? Haven't tried decoding them yet but it looks like from the symbol names they use this official speex library in the game: https://www.speex.org/docs/api/speex-api-reference/globals.html

Notice the symbols like speex_wb_mode, speex_bits_init etc

LazyDuchess commented 1 year ago

Yup, I believe they're Speex with a custom header

berylliumquestion commented 1 year ago

How's the progress on this? I'm trying to figure out what to do next

LazyDuchess commented 1 year ago

Haven't touched this yet, I believe there is Speex source code floating around if you want to take a look, but it's all C/C++ as far as I know. Also just a really obscure format nowadays.

actioninja commented 1 year ago

~~Been picking at this one. I have a start but it's still pretty gnarly. Getting a rough idea of what the header looks like but basically every field I'm like "maybe this?" It looks like it's roughly: 4 byte magic number "1XPS", read as BE, 1 byte flag, if flag is 2 bytes unknown~~

Unfortunately the main implementations are all some cpp nonsense so all the calls are behind vtables that I haven't worked out of the location of. Still getting a handle on how Ghidra works, and a plugin to resolve rtti wasn't working right. IDA was choking on it as well, giving binary ninja a shot.

The actual file seems to have the header followed by some kind of regular potentially padding data, then something that seems to be the actual speex payload. Once where the custom implementation falls out and it's just calling libspeex decode it should be fairly easy from there.

LazyDuchess commented 1 year ago

Been picking at this one. I have a start but it's still pretty gnarly. Getting a rough idea of what the header looks like but basically every field I'm like "maybe this?" It looks like it's roughly: 4 byte magic number "1XPS", read as BE, 1 byte flag, 4 bytes padding(?), 4 bytes either speex mode when flag is 0 or unknown when flag is 1 2 bytes unknown

Unfortunately the main implementations are all some cpp nonsense so all the calls are behind vtables that I haven't worked out of the location of. Still getting a handle on how Ghidra works, and a plugin to resolve rtti wasn't working right. IDA was choking on it as well, giving binary ninja a shot.

The actual file seems to have the header followed by some kind of regular potentially padding data, then something that seems to be the actual speex payload. Once where the custom implementation falls out and it's just calling libspeex decode it should be fairly easy from there.

Hey! Thanks for checking this out, happy to see some progress.

I should probably link to this somewhere, maybe in the readme, there's a MAC build of the Bon Voyage executable with debug symbols which might help as it reveals function and class names: Dropbox Link

actioninja commented 1 year ago

so turns out these aren't speex frames, they're some kind of further encoded audio frames that do some kind of nonsense before actually calling the speex frame decode. Fun.

Seems like it might just be 1 byte frame size followed by the speex frame? not sure.

actioninja commented 1 year ago

Tentatively saying I think I've got it, working on writing a tool to decode spx1 files to wav now. If that works, then this is correct, and the true test of it actually being accurate will be reencoding

header: 4 bytes: Magic Number (SPX1 in Little Endian) 1 byte: Always 1 4 bytes: data size of unencoded data, not actually used for decoding seems to be some kind of reference number similar to other s2 datatypes 4 bytes: Speex mode. read as a signed type. 2 bytes: largest speex frame, helps prevent reallocations when decoding because the same buffer is reused

payload: arbitrary number 1 byte: frame size in bytes (number of bytes specified by first bytes): speex frame, can be directly decoded with libspeex

LazyDuchess commented 1 year ago

awesome, should be straightforward to turn into unity audioclips if the wav conversion works

lingeringwillx commented 2 weeks ago

The format suggested by @actioninja is roughly correct:

4 bytes: magic header (SPX1)
1 byte: number of channels (always 1, mono)
4 bytes: decoded size
4 bytes: speex mode (always 2, ultra-wideband mode, sampling rate 32khz)
2 bytes: samples per frame/decoded frame size (640 samples, or 1280 bytes)

loop until the end of the file:
1 byte: encoded frame size
encoded speex frame

You would call speex_decode_int on the encoded frames to decode the file, the decoded frame size is always 640 samples/1280 bytes. This example in the speex website shows a similar approach to encoding and decoding.

The total decoded file size actually comes out to be a little larger than the decoded size written in the header. This is likely because zeros were appended to the end of the file before encoding so that the last frame would have the same size as the other frames. To work around this you could just allocate your array/buffer to the decoded size from the header + 1280 bytes, so that you won't need to resize the array later.

I've managed to decode the files using this format.

I found two C# libraries that can decode speex: NSpeex: Pure C# library, It appears to be used as a dependency in one popular library.
SpeexSharp: C bindings to the original speex library.

LazyDuchess commented 2 weeks ago

That works great! Might implement NAudio as it's convenient for playing MP3s as well. Thank you!

LazyDuchess commented 2 weeks ago

Implemented, works like a charm!