Corrupted file when a field has non-ascii characters - Githubissues

cloudflare / py-mmdb-encoder

Create mmdb files to encode prefix lists.

BSD 3-Clause "New" or "Revised" License

30 stars 11 forks source link

Corrupted file when a field has non-ascii characters #6

Open bcharron opened 4 years ago

bcharron commented 4 years ago

When trying to create an mmdb with non-ascii characters, the file produced cannot be read. It's like the offsets are wrong..

I think it's because the offset written to file assume that the python string length is the same as the output bytes when a string is encoded to utf-8.

Setting the length from the encoded string seems to produce the correct result at https://github.com/cloudflare/py-mmdb-encoder/blob/master/mmdbencoder/__init__.py#L346

length = len(value.encode('utf-8'))