Ancurio / mkxp

Free Software implementation of the Ruby Game Scripting System (RGSS)
GNU General Public License v2.0
524 stars 137 forks source link

UTF-8 equivalence issue on OS X with pathCache #52

Closed elizagamedev closed 10 years ago

elizagamedev commented 10 years ago

It seems that RPG Maker/Ruby expect NFC filenames, but OS X normalizes filenames with (an outdated and frozen variant of) Unicode NFD on the filesystem. Here's an article explaining the difference, since I only learned about this topic after experiencing issues with filenames transferred from OS X to Linux.

As a result, the path cache generated by MKXP for files not in RGSS archives on OS X will be incorrect for filenames with decomposed characters (which are very common in Japanese games). Disabling pathCache seems to work around this, but it still seems to randomly cause some graphical issues (sometimes the title screen BG is missing, sometimes the menu doesn't update when the cursor is moved, sometimes it works perfectly; I can't tell if this is actually related to the path cache, but I haven't ever experienced these problems until now).

A suggested fix: OS X comes with a version of libiconv which can handle both NFC UTF-8, which is used by the regular old "utf-8" codec, and OS X's particular variant of NFD, which is used by the "utf-8-mac" codec. When the target host is OS X, all filename data returned by external libraries should be run through iconv to be normalized as NFC. When accessing files, outgoing filenames are automatically normalized by OS X, so iconv shouldn't be necessary for that.

cremno commented 10 years ago

Ruby has its own encoding converter:

"\u{03D3}".encode(Encoding::UTF_8_MAC, Encoding::UTF_8)  # => "\u03D2\u030D"
Ancurio commented 10 years ago

@mathewv Thanks for the info. I read up on it a bit and this is indeed a problem. I am very hesitant to add OS specific code to mkxp, but in this case there might not be another option (at least until we find a better solution. Would be cool if physfs could just do the conversion to NFC for us, but I doubt patches adding a dependency on iconv would be accepted.)

This brings another question though: Are we guaranteed to be provided with NFC strings from the scripts? I'm pretty sure the utf8 conversion in the MRI binding will do that, but I'm thinking more generally. Are the strings (= filenames) contained in the raw scripts as saved by RPG Maker guaranteed to be NFC?

Regarding the random glitches when turning off patchCache: These should not be related. You mention you haven't experienced these problems "until now"; what do you mean by that? What changed?

@cremno We can't use ruby here because the file lookup happens in core mkxp, not the bindings (ie. it wouldn't solve mruby/possible future bindings).

elizagamedev commented 10 years ago

I've written two UTF-8 text files containing the 'が' character, one NFC, the other NFD, and tried copying the contents of each into the script editor of a freshly-installed trial version of RPG Maker XP in a Windows 7 VM. The NFD version doesn't render properly in either Notepad or the script editor (screenshot of that here). I think with this we can reasonably assume that all text created by RPG Maker (and all unofficial editors) is NFC. I think we can also assume that RGSS archives have only NFC filenames, since that seems to be the norm for Windows filenames. I guess it's possible for an unofficial RGSS archiver program to create NFD filenames when run on OS X, but the game would probably break with the official RGSS engines, so I don't think that needs to be considered.

It's hard to say exactly what's causing the glitches I was running into, since I haven't tested mkxp too much on OS X. I've tested two other games without Japanese filenames and they seemed to work fine, but I can't remember exactly which ones those were. The only difference between those games and this one is... well, I guess basically everything. You can download it here if you'd like to try for yourself. I'd try it on Linux, but for some reason mkxp suddenly segfaults on the draw_text function...

Ancurio commented 10 years ago

Thanks for your experiments mathewv. According to this, it seems indeed that NFC is the preferred normalization on Windows, so let's base our assumptions on that. Since I'm not able to do any development on OSX, I cannot write/test a patch that does the iconv conversion; do you think you could do that? I was thinking of simply passing the opened iconv_t as a third struct member in the physfs callback data struct, something like that.

Concerning the glitches, I have downloaded the game you linked and tried it out (on my Linux system), and there was nothing suspicious I could see. Everything behaved as expected. Can you open a 2nd issue with screenshots of the glitches you're seeing?

elizagamedev commented 10 years ago

Sure, I'd love to help. I'll try writing a patch sometime this week before classes start. Before I open an issue for those other glitches, though, I want to mess around with it more and make sure it's not something I'm causing.

cremno commented 10 years ago

@Ancurio: I've mixed up something. I've made the assumption mkxp (well, just the MRI binding) already converted from UTF-8 to UTF-16LE on Windows. That isn't the case. But now I'm asking myself does file lookup with non ASCII characters even work there? If I'm not mistaken (again), it doesn't too.

elizagamedev commented 10 years ago

According to the documentation, PhysFS accepts UTF-8, so it should work on Windows transparently.