Closed pablocbre closed 4 years ago
Hi @pablocbre. The tiling pipeline does allow UTF-8 files. Can you give me the asset number so I can test it out?
20547.
The error message states:
Additional Information: invalid byte 'A' at position 2 of a 2-byte sequence, Line 1, Column 1
Hi @pablocbre. Looks like the file is in ISO-8859-1 rather than UTF8.
file -i 20547.gml
20547.gml: text/plain; charset=iso-8859-1
I converted this to UTF8 using
iconv -f ISO_8859-1 -t UTF8 20547.gml -o 20547_utf8.gml
The UTF8 file then works correctly through the tiler and the properties include UTF8 text.
Hi @shehzan10. Okay, that means I got confused and I was actually already updating my assets in UTF8 (such as in assets 20541).
The original question came motivated by a problem that's still alive. Using the same sandcastle you recently shared with me in a different issue, when I print some buildings metadata, I'm getting a strange behaviour:
Buildings in a CityGML have an attribute called 'building', where they store their name.
I upload them to Ion an consume them as 3DTiles. When printing the features properties on the CesiumJS viewer, most of them show fine. So 'Town Hall' prints as 'Town Hall'.
But those with non-ascii chars only store the chars after them. So, 'Gúeran' prints as 'úeran' and 'Corporación 3' prints as 'ón 3'.
If you look for specific examples back on the original CityGML file, the names appear correctly ('Gúeran' is 'Gúeran' and 'Corporación 3' is 'Corporación 3').
So, that's the issue I'm facing. I'm not sure about whether I can give you any extra details that could help.
Hi @pablocbre
I don't see either of the buildings you mentioned in the in source files of either 20541 or 20547.
In the 20547 file, one example I found was building: DIPUTACIÓ 353
being printed correctly.
However, this may still be an issue. Do you think you can isolate the 2 buildings you mentioned into a separate CityGML file and upload it? We can run some further testing on it.
PS. I downloaded 20541 and the encoding seems to be ASCII, not UTF-8.
Hi @shehzan10, if you:
Use the Sandcastle example to visualize asset 20541
Use the geocoder with the following text (353 Cl Diputacio, Barcelona, Spain) to travel to that building.
Print the metadata on the building.
You'll see the same "DIPUTACIÓ" 353 being cut (and appearing as "Ó 353").
Hi @pablocbre
I think this is a result of the file being in ASCII encoding. In the 20541 asset, the string attribute is
<gen:stringAttribute name="building"><gen:value>PASSEIG SANT JOAN 39-41/DIPUTACIÓ 355</gen:value></gen:stringAttribute>
Note:
Ó
is the Numeric Character Reference forÓ
https://en.wikipedia.org/wiki/%C3%93
Whereas in the 20547 asset, in both ISO-8859-1 and UTF-8, the attribute is
<gen:stringAttribute name="building"><gen:value>PASSEIG SANT JOAN 39-41/DIPUTACIÓ 355</gen:value></gen:stringAttribute>
I am able to reproduce DIPUTACIÓ 353
being shown as Ó 353
for 20541. But in 20547, DIPUTACIÓ 353
is printed correctly.
I also tried converting the 20541 ASCII file to UTF-8 and ISO-8859-1, but since ASCII is a direct subset, iconv doesn't convert the file.
I then tried to find-and-replace all Ó
with Ó
in the file. The running this through the tiler also produces the correct strings.
So I think your best option is to export UTF-8.
We're migrating the issues in this repository to consolidate questions, Cesium ion feature requests, and bug reports on the Cesium community forum, that way there'll be one place to search for answers.
I believe the original issue was resolved here, but if you have any follow up questions please post in the community forum!
Hi guys. I tried to upload a CityGML encoded with UTF-8 and an error occurred when tiling.
I was wondering if Ion supports UTF-8. If not, how can we upload content with characters such as á é ü etc?
Thank you