gazetteerhk / census_explorer

Explore Hong Kong's neighborhoods through visualizations of census data
http://gazetteer.hk
MIT License
42 stars 12 forks source link

Recreate GeoJSON files with correct encoding and properties #22

Open cmkpl opened 10 years ago

cmkpl commented 10 years ago

For the Chinese Characters there, it cannot be decoded as UTF-8.

EDIT (by @hxu): Official GeoJSON files are encoded in Big5 and only include Traditional Chinese names and English names. We only use the CACODE and DCCODE from the GeoJSON files. We should add in other properties from our final translation maps to create a fully specified GeoJSON file that others can use without our translation maps.

hxu commented 10 years ago

This is correct, but it seems like in my version of the original Shapefiles, the encoding is also corrupted. @2blam in your project (https://github.com/2blam/HK-geojson), how did you get your polygon.json file to be encoded in UTF-8?

As a side note, we actually are not using the information stored in the GeoJSON files -- the TopoJSON file strips out all of the information except the CACODE and DCCODE, which is then used to map against our translation maps. I am somewhat inclined to mark this as wontfix.

hxu commented 10 years ago

Here's what the raw shapefile looks like in QGIS for me:

image

2blam commented 10 years ago

I tried to open the shape file in my qgis, I got the same problem in CNAME column [image: Inline image 3]

I think the original encoding of the shapefile is not UTF-8. Thus, qgis cannot show the content correctly.

On Fri, Feb 7, 2014 at 5:58 PM, hxu notifications@github.com wrote:

Here's what the raw shapefile looks like in QGIS for me:

[image: image]https://f.cloud.github.com/assets/38583/2108381/6ce7f9cc-8fde-11e3-88e5-cf7c74c05268.png

Reply to this email directly or view it on GitHubhttps://github.com/hxu/hk_census_explorer/issues/22#issuecomment-34422649 .

cmkpl commented 10 years ago

I guess the shapefile is encoded in Big5.

2blam commented 10 years ago

Yes, I think so. I converted the JSON into UTF8 format, you can have a look from my github.

hxu commented 10 years ago

@2blam How did you do the conversion? With QGIS? After exporting from QGIS?

hxu commented 10 years ago

I think the right way to fix this problem is to ignore the official GeoJSON files and generate our own, after our translation mappings are done. The reason is that the official GeoJSON files only include the Traditional Chinese name, whereas we may want to include Simplified Chinese as well. Since we'll have to add in Simplified at some point anyway, might as well add Traditional with the correct encoding at the same time.

2blam commented 10 years ago

To convert the GeoJSON in UTF-8 encoding, I wrote a python script to read the previous GeoJSON file (BIG5 encoding ?!) and call json.dump with extra argument (encoding="utf-8"). The output file will be in UTF-8.

debuggingfuture commented 10 years ago

@2blam referred to the facility data, which is UTF-8 agree with @hxu, I will put up a translation mapping for S,T,E of those DC / CA, which can be added back in the geojson file

e.g. "ENAME": "Tsuen Wan Rural West", "CNAME": "荃灣郊區西", "SCNAME": "荃湾郊区西"

hxu commented 10 years ago

I would recommend "ENAME", "SNAME" and "TNAME" in keeping with the translation file convention.