Modify python word segmentation script so that it handles <div>s with @textParts

The python script that does word segmentation currently looks for //div[@subtype="transcription"]/p and applies the word segmentation rules to the text and element nodes inside that <p> element.

However, there are some inscriptions that have multiple texts on them or have texts on more than one part of the object. In this case , the structure of the transcription div is as follows:

//div[@subtype="transcription"]/div[@type="textPart"]/p where there is more than one textPart. For ex caes0509.xml:

          <div type="edition" subtype="transcription" ana="b1">
                <div type="textpart" subtype="obverse">
                    <p>βονόσου</p>
                </div>
                <div type="textpart" subtype="reverse">
                    <p><foreign xml:lang="lat">Bonosu</foreign></p>
                </div>
            </div>

Other examples: jeru0522.xml, mare0437

The script currently locates and segments the contents of the <p> in the first textPart. It etiher converts or ignores any subsequent ones, but only writes out the first one in the segmented output.

The script should convert and output each of the textPart divs.

Python script folder with output files

Will add example output - current and desired

Brown-University-Library / OLD-ARCHIVED_iip-production

Modify python word segmentation script so that it handles <div>s with @textParts #132