dpethes / rerogue

Tools to extract data from Star Wars: Rogue Squadron 3D
http://satd.sk/web/rs/
GNU General Public License v3.0
25 stars 7 forks source link

TXT file datas types #4

Closed JackCarterSmith closed 2 years ago

JackCarterSmith commented 5 years ago

Hello,

Just to ask if new types of data have been identified?

I think texture's alpha layer is wrong or uncleared. I think about that because some texture have a gradient effect (ex: light for imp_landingplatform) and pilot rendering keep its background layer.

dpethes commented 5 years ago

Hi, nobody is actively working on this. There are definitely some texture issues, repeated textures are broken as well, iirc.

JackCarterSmith commented 5 years ago

Hi, that's what I thought. What do you mean by repeated textures? Those that are applied on several faces simultaneously?

dpethes commented 5 years ago

No, those that are used as a repeating pattern, even on one face, see world devastator's side windows. Actually I think the issue was that I couldn't find the data that decides if texture should be repeating or clamped (it was quite some time ago, so I don't remember the specifics).

JackCarterSmith commented 5 years ago

Okay, thank you, I think it's can be a good tips.

I'm about rewriting some of your code part in C (for a more versatility) and try to complete what can be unknow datas. I will export all textures datas to hexdump, I think the more datas I have, the more luck I have to extrapolate the functions of each segment. I can send you my advance through discord (we are on the same server).

dpethes commented 5 years ago

If you look at hmt parser, ReadTexture() - there are some skipped/unknown bytes, it could be one of them. Not sure what you mean by more versatility, but go ahead, if you're more comfortable with C. If you stumble on something useful, update the file docs and make a pull request?

JackCarterSmith commented 2 years ago

Hi everyone,

I've continued the analysis of the game data, and I am stuck on what seems to be the display of the in-game text (subtitles, mission description, etc.) in _TXT file type.

I can draw a draft of some header information:

LE

Header | (Part_header | entries) * number of parts

Header? (28B):
unsigned short [2B]: number of parts (always 0x05)
unsigned short [2B]: count of entries?
unsigned int [4B]: offset to part 1
unsigned int [4B]: offset to part 2
unsigned int [4B]: offset to part 3
unsigned int [4B]: offset to part 4
unsigned int [4B]: offset to part 5
unsigned int [4B]: file size (or offset to EOF)

Part_header? (xB - first header can be used to estimate the size of part_header):
unsigned short [2B]: offset to the first entry from the start of part_header offset
repeat_for( offset to the first entry / 2B ) {
    unsigned short [2B]: offset to the X entry from the start of part_header offset
}

Entries:
ID-flags system/XOR encoded string/???

But the obfuscated string doesn't get me inspiration...

This an exemple of an entry found in the file (front_TXT): FA B6 E0 AE EF A2 E7 B8 FB CA 95 D9 EB B7 E5 A0 EE AA EF B5 E3 AC F9 AA 8A C5 8B AB E9 A8 FA B1 F9 BC EF A7 A7 8D

I thought, at first, that there could be another system of indices/flags between the "letters" and the fonts located in another directory. If anyone has an opinion on this...

dpethes commented 2 years ago

image frontTXT, this string starts at file offset 210. Changing a single byte changes two characters, so there's some bitpacking or compression going on. Edit: it's not compression, rather some kind of scrambling: looks like the file starts with strings beginning with '\LVNAME' (\LVNAME_C1_L1\AMBUSH AT MOS EISLEY etc., I took a look at memory dump), yet bytes of every string are different.

JackCarterSmith commented 2 years ago

I came to the same conclusion last night, but thanks for this tip on 0x210 (I had managed to isolate a bigger piece on the DATA.DAT directly on the pause menu), it will save me from having to restart a level each time... I haven't tried to change the memory on the fly yet but I don't know if the game would support it properly.

I've seen that some compression techniques are redundant, where the next data is determined by the previous value(s)... I don't know if the case applies here.

I'll try an empirical approach by testing different bit combinations on a restricted area (0x210 of front_TXT seems perfect to me) and see if there is a way to extract something by deduction.

I'll post the test table if anyone sees anything.

dpethes commented 2 years ago

Note that the string starts before that offset - the right offset is determined by "offset to the X entry from the start of part_header offset" as you found out. However changing the part that contains numbers in LVNAME part causes crashes though, as the game can't find the level name anymore. I also think that the decoded strings are zero terminated - when I modified end of the level name, the next level name was joined into it. And good luck - I tried modifying the data around that offset in a hex editor, repackaging the files and launching RS a bunch of times, but I wasn't able to figure out what's going on.

JackCarterSmith commented 2 years ago

The datas should act as a string tree I suppose... If we change the value in the "path" it's broke the game (no errors handling). To make test, we need to modify only the ending for comparison.

Maybe if I can find the asm text algo code of RS when it's parse data from file... It can take a long time, and luck yes! Thx for the infos 👍

JackCarterSmith commented 2 years ago

Okay okay, I've some inputs datas:

A9 1010 1001 \  |   FA 1111 1010 \
E5 1110 0101 L  |   B6 1011 0110 L
B3 1011 0011 V  |   E0 1110 0000 V
FD 1111 1101 N  |   AE 1010 1110 N
BC 1011 1100 A  |   EF 1110 1111 A
F1 1111 0001 M  |   A2 1010 0010 M
B4 1011 0100 E  |   E7 1110 0111 E
EB 1110 1011 _  |   B8 1011 1000 _
A8 1010 1000 C  |   FB 1111 1011 C
99 1001 1001 1  |   CA 1100 1010 1
C6 1100 0110 _  |   95 1001 0101 _
8A 1000 1010 L  |   D9 1101 1001 L
BB 1011 1011 1  |   EB 1110 1011 2
E7 1110 0111 \  |   B7 1011 0111 \
A6 1010 0110 A  |   E5 1110 0101 R
EB 1110 1011 M  |   A0 1010 0000 E
A9 1010 1001 B  |   EE 1110 1110 N
FC 1111 1100 U  |   AA 1010 1010 D
AF 1010 1111 S  |   EF 1110 1111 E
E7 1110 0111 H  |   B5 1011 0101 Z
C7 1100 0111    |   E3 1110 0011 V
86 1000 0110 A  |   AC 1010 1100 O
D2 1101 0010 T  |   F9 1111 1001 U
F2 1111 0010    |   AA 1010 1010 S
BF 1011 1111 M  |   8A 1000 1010  
F0 1111 0000 O  |   C5 1100 0101 O
A3 1010 0011 S  |   8B 1000 1011 N
83 1000 0011    |   AB 1010 1011  
C6 1100 0110 E  |   E9 1110 1001 B
8F 1000 1111 I  |   A8 1010 1000 A
DC 1101 1100 S  |   FA 1111 1010 R
90 1001 0000 L  |   B1 1011 0001 K
D5 1101 0101 E  |   F9 1111 1001 H
8C 1000 1100 Y  |   BC 1011 1100 E
8C 1000 1100    |   EF 1110 1111 S
A6 1010 0110    |   A7 1111 0111 H
        |   A7 1111 0111  
        |   8D 1111 1101

The datas seem aligned with the "entries", header description can be partially confirmed. I can't found a "null" terminated char or else, but instead I remark that all entries finish with 2x the same byte +/- 1 byte for pading or calcul correction. Need to investigate more...

I tested different combinaison of bits in the first and second chars of the displayed string in mission selection "BARKESH" (* = means space or empty char). I touch only one byte at a time:

DE -> ** | 9E -> **
DF -> ** | 9F -> **

E0 -> W* | A0 -> EN
E1 -> VA | A1 -> DO
E2 -> UB | A2 -> GL
E3 -> TC | A3 -> FM
E4 -> SD | A4 -> AJ
E5 -> RE | A5 -> *K
E6 -> QF | A6 -> CH
E7 -> PG | A7 -> BI
E8 -> OH | A8 -> MF
E9 -> NI | A9 -> LG
EA -> *J | AA -> OD
EB -> *K | AB -> NE
EC -> *L | AC -> IB
ED -> ZM | AD -> HC
EE -> YN | AE -> K*
EF -> XO | AF -> JA
F0 -> GP | B0 -> U*
F1 -> FQ | B1 -> T_
F2 -> ER | B2 -> W*
F3 -> DS | B3 -> V*
F4 -> CT | B4 -> QZ
F5 -> BU | B5 -> P*
F6 -> AV | B6 -> SX
F7 -> *W | B7 -> RY
F8 -> OX | B8 -> *V
F9 -> NY | B9 -> *W
FA -> MZ | BA -> _T
FB -> L* | BB -> *U
FC -> K* | BC -> YR
FD -> J* | BD -> XS
FE -> I* | BE -> *P
FF -> H_ | BF -> ZQ

I can probably confirm the idea of "linked" bytes, when we change one byte, you change the first letter AND the next one, move the next byte, and repeat... I'm not sure for now what operation is done exactly to get the "value" of char. It's sure it isn't a compression, the size of string correspond... But why? Security feature against corrupted datas? Maybe an unknown feature to make the font display "great"?

Somthing to try: replace only ONE char in the string...

JackCarterSmith commented 2 years ago

I found a solution:

That's the last double char who help me: xor them and you get the famous \0 char!

Edit: The initial char use the last byte of the previous entry. This implies we've to read all the entries, XOR them and do the stuff.

Edit2: For the first entry of the section (who does'nt have initial byte before it), we need to use special "byte-key": 0xF5 Probably an "easter-egg" of Factor5 studio.

Edit3: The five sections correspond to the language (in the order):

JackCarterSmith commented 2 years ago

Rogue_Squadron_2022-09-28_18-56-14

Not so bad... ;)

dpethes commented 2 years ago

Good job! :D