DotSpatial / DotSpatial

Geographic information system library written for .NET
MIT License
882 stars 370 forks source link

Code page files with shapefiles #788

Open mogikanin opened 8 years ago

mogikanin commented 8 years ago

I have a Problem if I need to use a Code page file .cpg thogether with my .shp file.

In this case the .cpg file is not exported too and the data in the exported file are shown with wrong coding.

Any idea how to change the Code to Export the cpg file too?

p.s. copied from #722

mkaring commented 8 years ago

Currently it is not possible to alter the encoding to a specific one. It always uses the system default encoding when creating a new shape file. In also does not create a encoding file (.cpg), because according to the standard set by ESRI no encoding file means that the system standard encoding is supposed to be used.

ahedhi commented 8 years ago

This is a bit of a complicated topic. I added support for reading .cpg files back in 2013 (see ddcbd9cff80079aabcf1cb902e3fa62eaaba3e83). However, I did not include writing them. On the other hand, DotSpatial will attempt to write the LDID (language id in the DBF file) according to the AttributeTable's encoding. If you attempt to read a shapefile created in DotSpatial in another piece of software (including ArcGIS), it will all depend on whether that software knows how to interpret the LDID.

ahedhi commented 8 years ago

Note that writing a CPG file is pretty straight forward. It's a text file with the code page (an integer) written to it in a single line. You can basically reverse the code I wrote referenced in my previous comment.

ahedhi commented 8 years ago

In reply to your comment, @mkaring, a shapefile written with the system standard encoding is not portable to another system with a different default encoding. (This is the reason I added cpg file support in the first place--we have clients around the world that give us shapefiles in various encodings because they are created on their Thai, Chinese, Arabic, etc... machines, and then we can't read them.) It would be better to specify the encoding explicitly, including setting the LDID and writing out a CPG--even if it is the default encoding for the current system. AttributeTable already sets the LDID, but it isn't always supported in other software (neither is CPG, but maybe one of them is). CPG allows you to also override this if it is set incorrectly.

mkaring commented 8 years ago

@ahedhi I was just stating the how it is currently. I agree that it would be the best way, and the most portable one, to always write the codepage files. No matter if the file is required or not. The file does not hurt and it ensures compatibility across systems with different codepages.

mogikanin commented 8 years ago

So, in the end, someone thinks that we should do something in DS, or can close this issue?

mkaring commented 8 years ago

I will implement something to support this when rewriting DotSpatial.Data. I was thinking of introducing some additional settings that allow overwriting the used encoding as well as some kind of compatibility mode flags that cause the implementation to follow strict ESRI standards or to allow best possible cross platform compatibility by writing the encoding file for example in any case.

Please assign this to me and attach it to milestone 2.0.