import of wide-character (e.g. UTF-16) tabular data fails

BSBI / ddb-issues

Public bug tracker for the BSBI Database (DDb)

0 stars 0 forks source link

import of wide-character (e.g. UTF-16) tabular data fails #52

Open japonicus opened 6 years ago

japonicus commented 6 years ago

The tabular data importer ought to read BOM (byte-order-marker). Currently is guessing the encoding line-by-line, which is inefficient and also fails for UTF-16 data (and presumably also UTF-32).

japonicus commented 6 years ago

Partially fixed. BOM is now read.

Still reading files line-by-line splitting on CR terminator (single char) -which is wrong and error-prone.

Should take account of character width while loading files. (can't do the obvious whole-file-at-once as need to retain capability to import huge files).

Still to do rewrite file loader to load in chunks and callback each line.