crnormand / gurps

Implementing a GURPS 4e game aid for Foundry VTT
MIT License
105 stars 49 forks source link

Unicode characters don't work #851

Closed AndrewChe7 closed 3 years ago

AndrewChe7 commented 3 years ago

i am trying to import xml with <?xml version="1.0" encoding="UTF-8"?> header, but it doesn't accept unicode characters.

crnormand commented 3 years ago

In truth, we ignore the XML header. We aren't actually running the text through an XML parser. The file format is a holdover from the Fantasy Grounds import format (which WAS based on XML), and it is just too much work (at the moment) to change it.

We support Unicode in a lot of places... where is it not working for you? Do you have an example .gca4 or .gcs file you can zip up and attach to this issue?

AndrewChe7 commented 3 years ago

Ok, thanks. I am trying to generate same xml as generated from GCS, but it does not support Russian language.

AndrewChe7 commented 3 years ago

So change from readAsBinaryString to readAsText in line fixes the problem

crnormand commented 3 years ago

Unfortunately, we can't. The comment for the method explains:

/**
 * Read text data from a user provided File object
 * Stolen from Foundry, and replaced 'readAsText' with 'readAsBinaryString' to save unicode characters.
 * @param {File} file           A File object
 * @return {Promise.<String>}   A Promise which resolves to the loaded text data
 */

We specifically do NOT call readAtText() because it was destroying all of the Unicode characters. We have to read it as binary.

AndrewChe7 commented 3 years ago

The fact is that readAsBinaryString destroys all the unicode characters. Documentation say that readAsText uses utf-8 by default.

mjeffw commented 3 years ago

Give us the file you want to import and we'll give it a try with readAsText.

AndrewChe7 commented 3 years ago

test5.xml.zip For example this

AndrewChe7 commented 3 years ago

So, have you tested with readAsText?

crnormand commented 3 years ago

Yes... and your test file works when we swap the methods. I am currently trying to find the textcase that made me make the change in the first place ;-)

If I can't find anything, I will revert the code to readAsText() so your characters can import correctly.

crnormand commented 3 years ago

And I found it. GCA forces export as ISO-8859 so we needed to read the file as binary to get around this. So, what I will do is create a system setting to allow you to choose which file "read" method you would prefer (ISO-8859 or UTF-8)

crnormand commented 3 years ago

Done. https://github.com/crnormand/gurps/pull/868