dkfans / keeperfx

Open source remake and Fan Expansion of Dungeon Keeper.
https://keeperfx.net/
GNU General Public License v2.0
778 stars 78 forks source link

UTF-8 support #2896

Open AdamPlenty opened 11 months ago

AdamPlenty commented 11 months ago

Related issue: #2330

The other week, the media were praising KeeperFX on bringing Dungeon Keeper to the modern age, but one area in which even KeeperFX remains stuck in the 90s is character encoding. This is especially problematic for quick messages in level scripts; they are code page-dependent, which can lead to issues:

image image image

We need a universal code page (such as UTF-8) and font, not just for quick messages, but for the entire game. Otherwise, this'll continue to be an issue. With a universal system, anyone would be able to use any text, and it'll work regardless of language.

eddebaby commented 11 months ago

Unicode support is a topic I have looked in to for my own interests, in particular the question:

"is it feasible for a program to support all languages with the distributed binaries alone?"

The short answer is "not today". The longer answer is "not without both substantial work and binary size". As far as I am aware it has never been achieved by anyone.

What follows is my musings on approaching an answer to the question:

"if KeeperFX supports Unicode and UTF-8 encoding, does it improve the support of the currently supported languages, and does it make adding new languages easier?"

I have no definitive answer, I would suggest that it would be quite some work to answer this definitively! So either answer the question definitively to decide if Unicode should be supported, or implement Unicode to find out if it was worth the effort ;p

To implement Unicode in FX we would need to:

Then, additional Unicode characters could be added as glyphs in new or existing FX fonts - this is manageable so long as each language is handled one at a time.

Map makers would still need to save their work correctly when working in external programs though (i.e. don't save with the wrong encoding). However, if we switch all shipped files to UTF-8 encoding, then for anyone who copies/edits the shipped files: "all characters supported by FX" will be rendered in game. This presumes that FX will have enough context to determine the language used (see below), which in most cases will be covered by the player's FX language setting and the map maker being "language aware" when working - but there may be edge cases that lead to undesirable results for the player.

See https://en.wikipedia.org/wiki/Open-source_Unicode_typefaces for why implementing Unicode and UTF-8 support doesn't magically support all languages. The summary is:

Hypothetical example: A map maker editing a file in Notepad++ with UTF-8 encoding will be able to see all the characters supported by the font they are using in Notepad++. A player playing "UTF-8 KFX" will be able to see all the characters supported by our font. In both Notepad++ and KFX unsupported characters will not be rendered as intended. So, given that KeeperFX supports a finite list of languages, it needs to be the case that ALL unsupported characters are from languages that are not in the list of supported languages - I believe that is the current status quo of master (i.e. achieved via codepages).

To summarise:

I'd also add that Unicode is only needed when you want to be able to display different languages on the screen at the same time from the same data source (and the characters for all of the languages you do want to show do not exist inside a single codepage). So another question that arises is:

"in what instances does FX need to use Unicode?"

I do not know very much about the language handling/use in KeeperFX, so I'm unable to answer this question.

AdamPlenty commented 11 months ago

Quick messages in level scripts definitely ought to be in UTF-8, because players would want it to work even if the language is set to Japanese or whatever. As it is, we can forget about having quick messages in a language other than English and expecting them to work in all languages, because the game simply won't display the correct characters; for example, if the language is Japanese, everything (including quick messages) is interpreted as code page 932, which can result in text corruption, as seen above. This is because quick messages can only be in one code page, and they are not language-dependent; they are written directly in the script and just are what they are. The same goes for all languages and code pages.

One workaround might be to have level-specific text dat files for each language, like we do with map packs and campaigns, but even that is less than ideal, because the same text would need to be encoded in all code pages (we'd need several dat files for the same text even if it's not translated), and I'm not sure the East Asian font even supports letters with diacritical marks.