FireEmblemUniverse / fireemblem8u

Decompilation/disassembly of Fire Emblem: The Sacred Stones
https://fireemblemuniverse.github.io/fireemblem8u/
155 stars 37 forks source link

Handling of accentuated characters #681

Open minirop opened 2 weeks ago

minirop commented 2 weeks ago

I'm working on the European release of FE8 and wanted to know what is the best course of action to handle äll thôsè characters (to eventually be able to build both ROMs from the same source code).

1/ do like é and have [AccentedE] for each letter.

2/ Since FE8 uses Windows-1252 and it is almost compatible with Unicode, use the unicode characters (directly or via their \x00 code point), and only have a [specialCase] for œ Œ.

MokhaLeee commented 1 week ago

If you are talking about hardcoded const data, you can try to dig out makefile.

In FE6/FE7J, the hardcoded string are shift-jis encoded, thus you need to convert utf8 code (always used in morden linux) to shift-jis via iconv -f UTF-8 -t CP932

https://github.com/FireEmblemUniverse/fireemblem6j/blob/main/Makefile#L265

minirop commented 1 week ago

no, about the game's text: https://github.com/FireEmblemUniverse/fireemblem8u/blob/master/scripts/texttools/textdeparser.py#L101

MokhaLeee commented 1 week ago

After huffmam decompresson, each string may have a u16 array. As for how this u16 array corresponds to a string, we need to study its encoding method. JP version directly use 16bit shift-jis characters with some CTRL char, and US version use ASCII.

minirop commented 1 week ago

The EU version uses the same as the US version, it just has more accentuated characters, so either I add more tags:

elif u16_data == 0xC8:
    output = "[UppercaseGraveAccentE]"
elif u16_data == 0xC9:
    output = "[UppercaseAcuteAccentE]"
elif u16_data == 0xE8:
    output = "[GraveAccentE]"
elif u16_data == 0xF4:
    output = "[CircumflexO]"
# ...

or simply:

elif u16_data >= 0xA1 and u16_data <= 0xFF:
    output = chr(u16_data)

(for my tests I'm using the latter and it seems to work as expected)