NetTopologySuite / NetTopologySuite.IO.ShapeFile

The ShapeFile IO module for NTS.
33 stars 25 forks source link

Chinese character transcoding error #39

Closed DeliciousExtra closed 1 month ago

DeliciousExtra commented 4 years ago

In DbaseFileHeader class line 311 string name = DbaseEncodingUtility.Latin1.GetString(buffer, 0, buffer.Length); when the field value is Chinese character, it cant get the correct value. it should be use the Detected encoding from line 294 var encoding = DetectEncoding(ldid, cpgStreamProvider); to encoding the value, string name = encoding.GetString(buffer, 0, buffer.Length); then it get the correct value

axmand commented 3 years ago

In DbaseFileHeader class line 311 string name = DbaseEncodingUtility.Latin1.GetString(buffer, 0, buffer.Length); when the field value is Chinese character, it cant get the correct value. it should be use the Detected encoding from line 294 var encoding = DetectEncoding(ldid, cpgStreamProvider); to encoding the value, string name = encoding.GetString(buffer, 0, buffer.Length); then it get the correct value

for (int i = 0; i < length; i++)
{
    keys[i] = reader.DbaseHeader.Fields[i].Name;
    keys[i] = Encoding.GetEncoding("GBK").GetString(reader.DbaseHeader.Encoding.GetBytes(keys[i]));
}

it works well in temporary.

DGuidi commented 3 years ago

can you post a complete unit test or at least a complete unit of code that we can use to build a unit test using chinese characters? Thanks

DGuidi commented 3 years ago

ok I see that probably there's some problem here

in L310

// NOTE: only this _encoding.GetString method is available in Silverlight
string name = DbaseEncodingUtility.Latin1.GetString(buffer, 0, buffer.Length);

but in L294

var encoding = DetectEncoding(ldid, cpgStreamProvider);
if (_encoding == null) _encoding = encoding;

so I suppose the correct code for L310 might be

// NOTE: Silverlight is gone, baby... forgot and move on
string name = _encoding.GetString(buffer, 0, buffer.Length);
DGuidi commented 3 years ago

@FObermaier just take a look also here, please :)

DGuidi commented 3 years ago

@FObermaier @airbreather how about this fix? i can easily push this change if there is any objection

airbreather commented 3 years ago

@DGuidi LGTM :heavy_check_mark:

edit: just please make sure there's a test for it

DGuidi commented 3 years ago

@axmand please can you provide at least some test data so I can build a valid test for this issue?

DGuidi commented 3 years ago

please check if last commit fix the issue, as expected

KubaSzostak commented 1 month ago

Support for a different encodings has been added in the successor library. There is also a sample code demonstrating how to custom encoding.