Monstrofil / replays_unpack

51 stars 19 forks source link

Encoding issues regarding Chinese player names #30

Open HenryQuan opened 8 months ago

HenryQuan commented 8 months ago

Hi, I have found a very minor issue regarding Chinese player names while working on a PR which also uses this library, https://github.com/WoWs-Builder-Team/minimap_renderer/pull/14. This issue is specific to the CN realm because it supports using Chinese characters as the player name. This isn't possible elsewhere as far as I know.

Under battle_controller, we may want to add additional encodings to address issues regarding Chinese characters. This doesn't seem to affect English names.

name=player["name"].encode('ISO8859-1').decode('UTF-8'),
Monstrofil commented 8 months ago

@HenryQuan any chances you can link replay with chinese names in this issue? I will add some unit tests for that case.

HenryQuan commented 8 months ago

@Monstrofil Yes, I have one shared by my friend, 20231214_203306_PFSC210-Marseille_46_Estuary.zip. Weirdly, the clan tag can also be in Chinese, but it is encoded correctly. Only the player name is not correct. The Chinese server can be detected by checking the realm to be CN. Chinese characters can be detected using something like hanzidentifier.

image

Monstrofil commented 5 months ago

@HenryQuan the issue with utf-8 was caused by the old workaround that was intended to fix the problem with how python2 (which WG still uses) handles strings and how python3 does. I made a bit better variant which loads pickle as bytes and later recursively searches for bytes and tries to decode them as unicode strings. This option also has problems e.g. with empty strings, but at least it handles names properly.

I added this new solution to 13.2 and also backported it to 12.11 (because replay you sent was that version), so try it once you have time.