exch-bms2 / beatoraja

Cross-platform rhythm game based on Java and libGDX.
GNU General Public License v3.0
652 stars 148 forks source link

File I/O should use UTF-8, rather than system default encoding #360

Closed SkywaveTM closed 11 months ago

SkywaveTM commented 6 years ago

It is a real problem across the project. It leads to unexpected data loss or malfunctions.

Here is an example. For now, if options in a skin are named in Japanese, its values are not properly saved into/loaded from player config files in systems where Japanese is not their primary language. It causes skin setting are not properly saved across sessions and default values are selected whenever the program is booted up. In worst cases, the program die unexpectedly. Those examples are the things I suffered with beatoraja 0.6.2 and LITONE5 0.9 (180820) in Korean Windows 10.

I tried to fix it by myself, but there are some problems.

  1. File IO is not managed by specific class, rather every single classes are opening (and not closing) files. It means that we need to find all statements that opening any files.
  2. If we change the encoding, previous files will not be properly read.

Due to those problems, I just tested a quite-fix without a commit -- changing user config files to use UTF-8 and creating a new user profile -- and the problems I suffered were gone.

Here's the code snippet I tried in bms.player.beatoraja.PlayerConfig.java.

before

    public static PlayerConfig readPlayerConfig(String playerpath, String playerid) {
        PlayerConfig player = new PlayerConfig();
        Path p = Paths.get(playerpath + "/" + playerid + "/config.json");
        Json json = new Json();
        try {
            json.setIgnoreUnknownFields(true);
            player = json.fromJson(PlayerConfig.class, new FileReader(p.toFile()));
            player.setId(playerid);
            player.validate();
        } catch(Throwable e) {
            e.printStackTrace();
        }
        return player;
    }

    public static void write(String playerpath, PlayerConfig player) {
        Json json = new Json();
        json.setOutputType(JsonWriter.OutputType.json);
        Path p = Paths.get(playerpath + "/" + player.getId() + "/config.json");
        try (FileWriter fw = new FileWriter(p.toFile())) {
            fw.write(json.prettyPrint(player));
            fw.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

after

    public static PlayerConfig readPlayerConfig(String playerpath, String playerid) {
        PlayerConfig player = new PlayerConfig();
        Path p = Paths.get(playerpath + "/" + playerid + "/config.json");
        Json json = new Json();
        try (Reader fr = new InputStreamReader(
                new FileInputStream(p.toFile()), StandardCharsets.UTF_8
        )) {
            json.setIgnoreUnknownFields(true);
            player = json.fromJson(PlayerConfig.class, fr);
            player.setId(playerid);
            player.validate();
        } catch(IOException e) {
            e.printStackTrace();
        }
        return player;
    }

    public static void write(String playerpath, PlayerConfig player) {
        Json json = new Json();
        json.setOutputType(JsonWriter.OutputType.json);
        Path p = Paths.get(playerpath + "/" + player.getId() + "/config.json");
        try (Writer fw = new OutputStreamWriter(
                new FileOutputStream(p.toFile()), StandardCharsets.UTF_8
        )) {
            fw.write(json.prettyPrint(player));
            fw.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
SkywaveTM commented 6 years ago

Oh... I notice that same issue was opened about 1 year ago. https://github.com/exch-bms2/beatoraja/issues/81

However, I'll not close this issue for now to alert other developers. THIS IS A REALLY HUGE PROBLEM.

wcko87 commented 6 years ago

In fact this issue was opened yet another time at #318.

318 also mentions a similar problem with the encoding of folder names for BMS files. (it crashes beatoraja when trying to play the song twice)

And yes this is a huge problem.

staraxis1684 commented 6 years ago

I think this problem also concerns with the problem at #143 .(This problem haven't resolved until now) While the application runnning, it seems that this gets the controller device name by MS932, on the other hand, it looks like that displays by other characterset (probably it's UTF-8) at key config and also writes the config file by SHIFT-JIS.

This inconsistency seems to be due to the fact that the character code is not clarified in various parts iof the application.

(日本語で補足しておくと、IIDX Premiumコントローラーは「2 軸 16 ボタン ジョイスティック」とDirectInput上では認識しているようですが、これがbeatorajaのキーコンフィグではこのように文字化けし、設定ファイルではさらに別の文字コードとして「2 ?? 16 ?{?^?? ?-2」と文字化けしているようです。ゲーム上の文字化けは仕方ないかもしれませんが、設定ファイルとJSONファイルの文字コード等が合っていないことが問題な気がします。 この問題が解決すると個人的には非常に助かります。よろしくお願いいたします。)

parataxia commented 6 years ago

There is an explicit override for Java to use UTF-8 exclusively for all I/O operations, should be useful if used persistently: -Dfile.encoding="UTF-8"

Add it to "_JAVA_OPTIONS" environment variable in your batch/command file.

edit Of course we're still faced with the issue of determining specific byte codes for proper charset implementation (i.e, UTF-8/SHIFT-JIS) which gives us arbitrary (it needed to had been solved earlier) solutions. Although the environment variable solution is good, I am scared to commit permanent change to given start-up script files because it would break compatibility possibly between users. Maybe a commented out appendage for the "_JAVA_OPTIONS" instead?

Furthermore there are "guessing" algorithms that use odd statistical analysis for trying to detect character encoding properly, I would think with config & BMS files/folder names it would make a poor solution because the data is too noisy (or not enough?) with indeterminable characters and there are more opaque characters available than unique chars. Also: slow AF I bet and too many files (so sometimes likely to fail as well.) Curious if this helps people because I noticed I posted earlier on accident, oops. Hopefully it helps the config problem at least, the rest need work!

wcko87 commented 5 years ago

The UTF-8 encoding flag that parataxia suggested works very well. I am now running beatoraja in US locale with absolutely no issues.

I added it to my .bat file and have had no encoding problems ever since. (I didn't lose any of my scores, replays, songdb, config or anything else either)

979CHAN commented 5 years ago

Hello, I don't know programming. So I ask. I've added -Dfile.encoding="UTF-8" to .bat, but when I re-run it, the skin settings and IR settings are all reset. I added it after _JAVA_OPTIONS=' and am I wrong? I do not have a place to ask, so I post a question on this side. (Google Translate)

wcko87 commented 5 years ago

Hello, The _JAVA_OPTIONS line should look like this:

set _JAVA_OPTIONS='-Dsun.java2d.opengl=true -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel' -Dfile.encoding="UTF-8"

(yes, the -Dfile.encoding="UTF-8" is placed outside of the quotes ' '. It's odd, but it doesn't seem to work inside the quotes)

Skin settings might reset after switching to UTF-8, because the skin settings may have been saved in the default encoding. I think it shouldn't be too much trouble to set the skin settings again, since you only need to do this once. I'm not sure about IR settings.

Also after switching to UTF-8, if you experience any crashes when loading the skins in game, try switching to the default skin, and back to your current skin. This refreshes the settings and saves them again in the correct encoding.

There should be no more crashes / issues after that.

Regarding places to ask questions, there is the JP BMS Discord, and the International BMS Discord. But I will answer this here because I think this answer may be useful to other people.

979CHAN commented 5 years ago

Thanks for the answer, but I have another problem. In the cmd window that appears when you run the bat file, what you originally saw in Japanese is broken and looks like Korean. Also, the loading is significantly slower than before. If you select a folder that contains songs, the songs will not appear and only the folders except the songs will be displayed. If you select that other folder, you can not select it. And the most important skin problem can not be solved. What should I do? I do not know where the Discord is, so I ask again here. (Google Translate)

979CHAN commented 5 years ago

References Links Thank you! But now I think that when I run it on my laptop, the setting has not been reset even though it is the same Korean locale. What happened? (Google Translate)

979CHAN commented 5 years ago

Also, if you run it as config.bat, it will recognize the controller, but will not recognize the button in the key setting. If you change the locale to Japanese, the initialization symptom remains. (Google Translate)

wcko87 commented 5 years ago

Your PC locale should not matter if you have correctly configured the UTF-8 mode. I suggest checking that you have set the UTF-8 mode flag correctly.

979CHAN commented 5 years ago

image This is done by applying UTF-8. When I did not apply it, the Japanese that did not break was broken. Is it normal? No configuration information is saved even when this is applied. (Google Translate)

galvn commented 3 years ago

Hello. I'm testing beatoraja 0.8.1 under Traditional-Chinese version of Windows 7

After adding -Dfile.encoding="UTF-8" into config.bat, two issues instantly solved: -BGA videos file that are placed under a path containing non-unicode characters that is of not system locale(in my case Japanese) are not played. -Skin settings can't be saved. And Japanese in the log windows Japanese are now displayed correctly.

So I second the idea to have -Dfile.encoding="UTF-8" in the _JAVA_OPTIONS as default.

wcko87 commented 3 years ago

Yes, my beatoraja english guide now also recommends that everyone switches to UTF-8 mode before starting playing beatoraja. https://github.com/wcko87/beatoraja-english-guide/wiki#locale-fix

But when you switch to UTF-8 mode the first time, the player1/config.json file may get "corrupted" and so beatoraja may not start (the config.json file has to be replaced).