CesiumGS / cesium-ion-rest-api-examples

Code examples for using the Cesium ion REST API :earth_americas:
https://cesium.com/
35 stars 16 forks source link

Question: utf-8 #146

Closed pablocbre closed 4 years ago

pablocbre commented 5 years ago

Hi guys. I tried to upload a CityGML encoded with UTF-8 and an error occurred when tiling.

I was wondering if Ion supports UTF-8. If not, how can we upload content with characters such as á é ü etc?

Thank you

shehzan10 commented 5 years ago

Hi @pablocbre. The tiling pipeline does allow UTF-8 files. Can you give me the asset number so I can test it out?

pablocbre commented 5 years ago

20547.

The error message states: Additional Information: invalid byte 'A' at position 2 of a 2-byte sequence, Line 1, Column 1

shehzan10 commented 5 years ago

Hi @pablocbre. Looks like the file is in ISO-8859-1 rather than UTF8.

file -i 20547.gml
20547.gml: text/plain; charset=iso-8859-1

I converted this to UTF8 using

iconv -f ISO_8859-1 -t UTF8 20547.gml -o 20547_utf8.gml

The UTF8 file then works correctly through the tiler and the properties include UTF8 text.

pablocbre commented 5 years ago

Hi @shehzan10. Okay, that means I got confused and I was actually already updating my assets in UTF8 (such as in assets 20541).

The original question came motivated by a problem that's still alive. Using the same sandcastle you recently shared with me in a different issue, when I print some buildings metadata, I'm getting a strange behaviour:

So, that's the issue I'm facing. I'm not sure about whether I can give you any extra details that could help.

shehzan10 commented 5 years ago

Hi @pablocbre

I don't see either of the buildings you mentioned in the in source files of either 20541 or 20547.

In the 20547 file, one example I found was building: DIPUTACIÓ 353 being printed correctly.

However, this may still be an issue. Do you think you can isolate the 2 buildings you mentioned into a separate CityGML file and upload it? We can run some further testing on it.

PS. I downloaded 20541 and the encoding seems to be ASCII, not UTF-8.

pablocbre commented 5 years ago

Hi @shehzan10, if you:

You'll see the same "DIPUTACIÓ" 353 being cut (and appearing as "Ó 353").

image image

shehzan10 commented 5 years ago

Hi @pablocbre

I think this is a result of the file being in ASCII encoding. In the 20541 asset, the string attribute is

<gen:stringAttribute name="building"><gen:value>PASSEIG SANT JOAN 39-41/DIPUTACI&#211; 355</gen:value></gen:stringAttribute>

Note: &#211; is the Numeric Character Reference for Ó https://en.wikipedia.org/wiki/%C3%93

Whereas in the 20547 asset, in both ISO-8859-1 and UTF-8, the attribute is

<gen:stringAttribute name="building"><gen:value>PASSEIG SANT JOAN 39-41/DIPUTACIÓ 355</gen:value></gen:stringAttribute>

I am able to reproduce DIPUTACIÓ 353 being shown as Ó 353 for 20541. But in 20547, DIPUTACIÓ 353 is printed correctly.

I also tried converting the 20541 ASCII file to UTF-8 and ISO-8859-1, but since ASCII is a direct subset, iconv doesn't convert the file.

I then tried to find-and-replace all &#211; with Ó in the file. The running this through the tiler also produces the correct strings.

So I think your best option is to export UTF-8.

OmarShehata commented 4 years ago

We're migrating the issues in this repository to consolidate questions, Cesium ion feature requests, and bug reports on the Cesium community forum, that way there'll be one place to search for answers.

I believe the original issue was resolved here, but if you have any follow up questions please post in the community forum!