erc-dharma / tfc-nusantara-epigraphy

DHARMA project task force C, Nusantara epigraphic corpus
https://dharma.hypotheses.org/
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

deal with explicit line-breaking hyphens in autoencoded xml files #62

Closed arlogriffiths closed 3 months ago

arlogriffiths commented 3 months ago

@ekobastiawan — I forget precisely how we got the xml files for Jeru-Jeru, Gulung-Gulung and Linggasuntan, but I think it was partially automated by Axelle about 1 year ago. See the <revisionDesc> sections of the relevant files.

I just noticed that the line breaks from our gdoc have not yet been properly dealt with.

Example:

<lb break="no" n="d9"/><supplied reason="lost">ta</supplied> Umasuki sarvvaprāṇa kita sa-
<lb break="no" n="d10"/><supplied reason="lost">kala</supplied> sākṣībhūta sthīti hana sukṣma

Should be:

<lb break="no" n="d9"/><supplied reason="lost">ta</supplied> Umasuki sarvvaprāṇa kita sa<lb break="no" 
n="d10"/><supplied reason="lost">kala</supplied> sākṣībhūta sthīti hana sukṣma

Can you go through those files and make the necessary changes? Thanks.