Open BioGeek opened 9 years ago
I've added some support for BOMs in the new unicode-support
branch. It should use a BOM (if present) to use the correct encoding. Can you try it out on files that you have?
There are a few other parts to this task that I haven't done yet:
HEAD.CHARACTER SET
head tagHEAD.CHARACTER SET
tag if there is no BOM
Sites like geni.com let you export Gedcom files that start with a Byte Order Mark (BOM).
Currently the regex fails for such files and you get a NotImplementedError.
See this detailed article for more about GEDCOM & the Unicode Byte Order Mark.
I'm currently toying with a solution like described here to remove the BOM and encode/decode the string, but I still get strange characters in the output.