bvschaik / julius

An open source re-implementation of Caesar III
GNU Affero General Public License v3.0
2.84k stars 317 forks source link

hard to improve some Chinese translation #689

Closed feiyunw closed 1 year ago

feiyunw commented 1 year ago

It's hard to improve some Chinese translation.

Per the source:

// src/core/encoding_simp_chinese.c:
static const chinese_entry codepage_to_utf8[IMAGE_FONT_MULTIBYTE_SIMP_CHINESE_MAX_CHARS] = {
    {0x8080, {0xe6, 0xa1, 0xa3}},
...
    {0x90d1, {0xe5, 0x87, 0xb8}},
};

// src/core/encoding_trad_chinese.c:
static const chinese_entry codepage_to_utf8[IMAGE_FONT_MULTIBYTE_TRAD_CHINESE_MAX_CHARS] = {
    {0x8080, {0xef, 0xbc, 0x81}},
...
    {0x918b, {0xe5, 0xbe, 0xb9}}
};

It uses a very customized encoding system for Chinese ideograph, which starts coded characters from 0x8080. None of the popular Chinese encoding systems uses such encoding scheme. Thus this game requires a special tool to subset and reorganize the official standard (as GB 18030 for Simplified Chinese) coded characters, and a customized font system is used to display such characters.

For Simplified Chinese, this customized encoding system is used in the files loaded by /src/core/lang.c: load_text(). It prevents muggles from translation improvement work. To rectify this problem, we should abandon this encoding system for translation text at least, where UTF-8 (or GB 18030 for the Simplified Chinese case) will be good. It'd be better to put translation in a plain text file, or some script which supports UTF-8 like Lua.

A modern font system will be helpful then, and it will be another topic.

FYI: The double bytes Simplified Chinese ideograph defined by GB 18030 begins from 0x8140 and has 21008 coded characters.

GB 18030, GB 18030-2022 Page 5 Fig.3 Output GB 18030-2022 repertoire by Lua pyftsubset Region Specific Subset OTFs Simplified Chinese

crudelios commented 1 year ago

The problem with abandoning the encodig system is that the original translation files rely on it to work.

We can't distribute the original translation files, even as a new format, as that would violate copyright.

So I don't think there's any reasonable solution for this issue.

Also, font renderering can't really be changed without massively changing the internal engine of the game, which goes beyond the scope of this project.

bvschaik commented 1 year ago

Like Crudelios said, we have the original C3 translators/bootleggers for Chinese to thank for coming up with this crazy encoding scheme back in the late 90's. Since we cannot alter the encoding without becoming incompatible with the original's files, and distributing the files in another encoding is a no-go, the only thing we can do is keep it.

Font rendering is another issue: the Chinese translations (or any translation actually) work on the basis of: each letter is an image. Any characters that we don't have an image for cannot be displayed. The Chinese font in the game only provides images for a subset of around 2000 of the 21008 characters you mention. Adding more is not possible without major changes, which is indeed beyond our scope.

It prevents muggles from translation improvement work.

For the muggles, we have EngConverter, though I now see that the latest release (0.4) didn't include support for the variant of Simplified/Traditional Chinese used in Caesar 3, I have uploaded version 0.5 that includes support.

feiyunw commented 1 year ago

I think Julius is based on the original English version of C3. The original encoding system is essential to keep the compatibility with the original English asset. Languages other than English can be fan-supported works, thus it's reasonable for add-on assets supported by add-on algorithms.

feiyunw commented 1 year ago

For copyright concerns, we are talking about how to encode/decode/check the game content and so on, not the content itself. The "translation" I refer means languages other than English, not necessarily the translation work of the C3 English asset. That will be safe.

bvschaik commented 1 year ago

Languages other than English can be fan-supported works, thus it's reasonable for add-on assets supported by add-on algorithms.

Yes, we already have a Greek fan translation in the works which does exactly that.

I think Julius is based on the original English version of C3.

The language support in Julius is based on the actual released translated variants of the game. For Chinese, there were CDs sold with C3 in Chinese, so we're using that as base. Copyright of the translations lies with the companies that did the translations, so they are still an issue.