SBAPI-Team / SmileBASIC-FileParser

A file parser library for SmileBASIC files.
Other
6 stars 1 forks source link

Miscellaneous discoveries and corrections #4

Open CyberYoshi64 opened 2 years ago

CyberYoshi64 commented 2 years ago

DAT content types

SmileBASIC DAT files are not just comprised of content types 3 to 5. Types 0, 1 and 2 are also valid, but left unused, as they were deemed unnecessary with the vanilla file types. These are Int8, Uint8 and Int16, respectively. They would come in handy for PCM streams or custom waveforms for MML, as you won't have to convert them yourself then. SmileBoom might have planned to add content type 0x06 in SB4, but it seems like the idea didn't get ground., as importing an SB3 DAT file with this content type will crash SB4 upon trying to load it. Otherwise, any other invalid type simply cause the parser to stop reading after the dimension length values and just fill the array with zeroes.

SB3's creator name encoding

SmileBASIC 3's creator/uploader name fields do not use UTF-8 encoding. Crafting a name with non-ASCII characters (in my case: H\x80\xFF\xCC\xF0\xA2䀵 lmao! XD) will return like so:

2022-01-29_14-55-00 In-file encoding: UTF-8

2022-01-29_15-34-37 532 In-file encoding: ANSI (raw)

The lines, I'm talking about.

The actual encoding that SmileBASIC 3 reads it, is ASCII. Should a byte not be decodable it throws it's usual method out of the window and read past the header and potentially also loop around to find valid characters. (Rough investigation, but far enough)

Nontheless, you should probably sanity-check the name and replace invalid characters with '?' before passing it off, when it's a SB3 file.

EDIT: Loving how I said the same thing twice but with different phrasing...

BrokenR3C0RD commented 2 years ago

Interesting. So following the pattern, I was right in thinking that GRP/color formats should be read as UInt16.

I'll work on adding in the extra decoding for the new DAT types. As for authors, I'll change it to ASCII encoding. I feel that SB4 may treat those as UTF-8 however, for some reason. If you could do a test on that, I'd appreciate it.

CyberYoshi64 commented 2 years ago

And yes! GRPs (type 3) is Uint16.

Sadly, I cannot test SmileBASIC 4 files, as I have a Switch Lite and I use SB3-formatted files to test on SB4. But the example binaries you had for SB4 projects already prove it uses UTF-8 for the name. It makes sense it uses UTF-8, as unlike SB3, where the name is just the NNID (strictly alphanumeric), SB4 uses the account name which can contain Unicode.

EDIT: Bildschirmfoto_2022-01-29_17-40-08 Yes! This indeed is UTF-8-encoded for SB4. (This is a seperate Python script to have a personal tool to mess with SmileBASIC files.)

CyberYoshi64 commented 2 years ago

I have now made my own parser (command-line) as I have showcased: Now I found new values in the common headers that are only affected server-side. https://github.com/CyberYoshi64/cy64-scriptbox/blob/main/CY64SBFP/PTCFile/SBFile/CommonHeader.py#L7 (jsyk, I don't want to make competition, I only make this tool for me to convert/edit SmileBASIC files, hence why I don't care that the way one would use my tool is really complicated. I only link it here to tell you that I am really interested in file parsing lately and I want to contribute as much as I can right now. tbf life is really boring right now, so I figured why not lmao)

"Upload IDs" I named them. They're basically which upload the file was created/modified and uploaded. If a fresh new file is uploaded, both upload IDs are updated. Downloading the file, modifying it and uploading it again will change the editor's upload ID. I'm not entirely sure about how it may be used, but I have to guess it's for the update procedure on SB4. I might be wrong and SB4 simply asks the server for upload history instead (like I saw with some SB4 projects with SBAPI) but I digress. Modifying the editor's upload ID will not work, it always gets updated. The creator's upload ID can be modified but has no effect. The server likely only cares that the ID is set to 0x0, to update it to the appropriate number.

Not sure whether I parse it wrong but SB4 changed it up from SB3. SB3 uses simple Uint64's counting up as many uploads were done on the server. SB4 also uses Uint64's, but shifted, with the highest bit set (?) The image I just showed, returns 0x800006A100000000 and 0x800006A200000000 for the creator and uploader respectively. If it was parsed wrong then the extra Uint32 for SB4 files, that is always zero, might just be padding?

CyberYoshi64 commented 1 year ago

Something just worth mentioning, no need to implement that.

The zLib compression flag (common header byte 0x04) ? Actually is a bitmask, was intended to hold internal flags.

Bit 0 is the zLib compression flag that we know. Bit 1 is responsible for locking down read access (for a GRP), causing a "Protected resource" error. Yes, the same thing that would happen with the japanese exclusive Content IP DLCs. This flag also works on DATs (since loading a GRP into an array basically makes it load as a DAT) but fails with that error immediately, which is obvious. The other bits appear to be unused and not checked for.

Thought it'd be interesting to tell you that as well. :)

BrokenR3C0RD commented 1 year ago

Thank you for all the interesting information! It's been a while since someone other than myself dug around in the deep depths of SmileBASIC's file format, so it's nice to see what a fresh eye finds :))

A few things to add onto what I've been able to gather as I rewrite the file parser in Rust:

CyberYoshi64 commented 1 year ago

PNG file "GRPs" are really funny actually. I haven't touched them in ages, but they actually don't have a header, unlike JPEGs. Instead, it's just... a PNG file. The only examples I've seen in the wild though are in the RomFS of SB4. If you're interested in taking a look, that never hurts!

Huh, that's actually really interesting; I want to experiment with that so bad. Sadly I'm a little crippled as I don't own a hacked Switch and won't do for a while; I do everything with just a hacked version of SB3. The way I tested SB4 was by smuggling SB4 files through SB3; which barely works but fails immediately when the data is any of SB4's unique file types, as SmileBASIC differentiates file types primarily via file prefix. The header seems more of a backup approach. Using a proxy to upload files with any prefix but T/B were deemed unsuccessful, or at least the method I use might be faulty...

BrokenR3C0RD commented 1 year ago

Ah, another thing!

Data type 6 for SB4 files IS actually implemented! Those are string data files, strings are stored as U32 length followed by an array of UTF-8 characters with count length. Forgot to mention that one, didn't click until now hehe

CyberYoshi64 commented 1 year ago

Data type 6 for SB4 files IS actually implemented! string data files — U32 length followed by an array of UTF-8 characters with count length — forgot to mention that one

Oh lol, I see. The data I used would try to do data that'd be over 2GB which is too much for the Switch so it crashes instead :joy:

Arrgh... my rudimentary smuggling makes stuff like this hard...