Clear-Bible / macula-hebrew

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible
Other
38 stars 9 forks source link

Misnumbered nodes in 1 Chronicles 20 #125

Open mr-martian opened 3 months ago

mr-martian commented 3 months ago

There's a segment of 1 Chronicles numbered as if it's in Genesis.

https://github.com/Clear-Bible/macula-hebrew/blob/5bcf8395fbc5a7ec4ae2da7bbd91e46cf925e9e6/WLC/nodes/13-1Ch-020.xml#L845-L859

mr-martian commented 3 months ago

Some more oddities:

import glob
from xml.etree import ElementTree as ET
import os

for fname in sorted(glob.glob('WLC/nodes/*.xml')):
    pieces = os.path.splitext(os.path.basename(fname))[0].split('-')
    if len(pieces) != 3: continue
    expected = pieces[0] + pieces[2]
    root = ET.parse(fname).getroot()
    for node in root.iter():
        nid = node.attrib.get('nodeId', node.attrib.get('{http://www.w3.org/XML/1998/namespace}id'))
        if nid and not (nid.startswith(expected) or nid.startswith('o'+expected)):
            print(fname, nid)
$ python3 validate_ids.py
WLC/nodes/13-1Ch-020.xml 0101501400210021
WLC/nodes/13-1Ch-020.xml 0101501400210020
WLC/nodes/13-1Ch-020.xml 0101501400210011
WLC/nodes/13-1Ch-020.xml 0101501400210010
WLC/nodes/13-1Ch-020.xml 0101501400310011
WLC/nodes/13-1Ch-020.xml 0101501400310010
WLC/nodes/14-2Ch-020.xml 0101501400210021
WLC/nodes/14-2Ch-020.xml 0101501400210020
WLC/nodes/14-2Ch-020.xml 0101501400210011
WLC/nodes/14-2Ch-020.xml 0101501400210010
WLC/nodes/14-2Ch-020.xml 0101501400310011
WLC/nodes/14-2Ch-020.xml 0101501400310010
WLC/nodes/14-2Ch-020.xml 0101501400120021
WLC/nodes/14-2Ch-020.xml 0101501400120020
WLC/nodes/14-2Ch-020.xml 0101501400120011
WLC/nodes/14-2Ch-020.xml 0101501400120010
WLC/nodes/14-2Ch-020.xml 0101501400210011
WLC/nodes/14-2Ch-020.xml 0101501400210010
WLC/nodes/14-2Ch-024.xml 0101501400210021
WLC/nodes/14-2Ch-024.xml 0101501400210020
WLC/nodes/14-2Ch-024.xml 0101501400210011
WLC/nodes/14-2Ch-024.xml 0101501400210010
WLC/nodes/14-2Ch-024.xml 0101501400310011
WLC/nodes/14-2Ch-024.xml 0101501400310010
WLC/nodes/15-Ezr-003.xml 0101501400120021
WLC/nodes/15-Ezr-003.xml 0101501400120020
WLC/nodes/15-Ezr-003.xml 0101501400120011
WLC/nodes/15-Ezr-003.xml 0101501400120010
WLC/nodes/15-Ezr-003.xml 0101501400210011
WLC/nodes/15-Ezr-003.xml 0101501400210010