calvinmetcalf / shapefile-js

Convert a Shapefile to GeoJSON. Not many caveats.
http://calvinmetcalf.github.io/shapefile-js/
MIT License
735 stars 230 forks source link

Fix for encoding #95

Closed K0den closed 3 years ago

K0den commented 6 years ago

I've no idea what to do about the .cpg code though...

calvinmetcalf commented 6 years ago

I don't think this fix is quite right, I think we actually need a fix here which checks if it's a buffer and if so converts it into a string

DistChen commented 3 years ago

I changed the parseDBF implementation. If no cpg file, I can get encode from dbf file.

  1. add ldid(Language driver ID) in dbfHeader

    function dbfHeader(data) {
        var out = {};
        out.lastUpdated = new Date(data.readUInt8(1) + 1900, data.readUInt8(2), data.readUInt8(3));
        out.records = data.readUInt32LE(4);
        out.headerLen = data.readUInt16LE(8);
        out.recLen = data.readUInt16LE(10);
        // add Language driver ID
        out.ldid = data.readUInt8(29); 
        return out;
    }

    in dbf file, the 29th byte represents Language driver ID,see dbf_file_fmt.

  2. modify module.exports:

    module.exports = function(buffer, encoding) {
        var actualEncode = encoding;
        var header = dbfHeader(buffer);
        if(!actualEncode){
            // need provider a ldid map, here is a example.
            var ldid = header.ldid;
            if(ldid === 77){
                actualEncode = "GBK";
            }else if(ldid === 120){
                actualEncode = "big5";
            }
            /*else if (ldid === value){
                actualEncode = "encode";
            }
           ......  */
        }
        var decoder = createDecoder(actualEncode);
        var rowHeaders = dbfRowHeader(buffer, header.headerLen - 1, decoder);
    
        var offset = ((rowHeaders.length + 1) << 5) + 2;
        var recLen = header.recLen;
        var records = header.records;
        var out = [];
        while (records) {
            out.push(parseRow(buffer, offset, rowHeaders, decoder));
            offset += recLen;
            records--;
        }
        return out;
    };

    About the mapping relationship between Language driver ID and code page, see File code page identifiers.

Hope it works for you.

calvinmetcalf commented 3 years ago

this has been fixed for a while