Combining characters in ANSEL documents not handled properly

GEDCOM parser for Python version:
Python version: 3.7
Operating System: macOS

Description

Combining characters in ANSEL documents do not appear to be handled appropriately. In the ANSEL encoding, combining characters occur before the character they modify, however in Unicode, they occur after. This translation does not appear to be happening when reading ANSEL GEDCOM documents.

What I Did

import io

import ged4py

doc = b"""
0 HEAD
1 CHAR ANSEL
0 TAG P\xea
1 CONC al
""".strip()

with io.BytesIO(doc) as file:
    with ged4py.parser.GedcomReader(file) as reader:
        note = reader.read_record(20)
        print(note.value)

Given the document, I would have expected the output:

Pål

Instead I'm seeing

P̊al

This implies that the position of the combining character is unchanged when it was translated to unicode, however given the rules for combining characters in unicode, it is getting applied to the first character instead of the second.

andy-z / ged4py

Combining characters in ANSEL documents not handled properly #7

Description

What I Did