Open kigawas opened 2 months ago
I appreciate the offer. Sure, go for it!
Those XML files are infamously inconsistent. I'll make you a member of the cltk org; but can make pull requests as usual on this repo.
@kylepjohnson Thanks! I'll propose a PR soon to regenerate JSON for Ammianus/opensource/amm_lat.xml
. If it looks good to you, I'll expand it to other files as well.
@kylepjohnson
You can compare the newly generated file with the old one.
In the new file:
<corr sic="earn">eam</corr>
-> eam)The only difference is this line: "Hoc Marte Cyzico reserata, Procopius ad eam propere festinavit, veniaque universis qui repugnavere donatis, Serenianum solum iniectis vinculis, iussit duci Nicaeam servandum artissime. 12. Statimque Ormizdae mature iuveni ..."
because the original xml file misses <milestone unit="section" n="12"/>
before Statimque
@kigawas I closed the last PR (#3). Let's talk about it for a bit before the next one. I think your goal ought to be to parse these better, but keep the output files otherwise the same.
Are .xml.json
files input or output? Since they have exactly the same information with .xml
files, it's not necessary to maintain two duplicate pieces.
They are outputs and necessary, since the xml is very inconsistent and it is inconvenient for downstream m users to xml into their databases and applications.
Sep 16, 2024 at 18:20 by @.***:
Are > .xml.json> files input or output? Since they have exactly the same information with > .xml> files, it's not necessary to maintain two duplicate pieces.
— Reply to this email directly, > view it on GitHub https://github.com/cltk/lat_text_perseus/issues/2#issuecomment-2354317350> , or > unsubscribe https://github.com/notifications/unsubscribe-auth/AAOE36CLEZL3IMWD3R4WNMLZW57UDAVCNFSM6AAAAABN33XHLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJUGMYTOMZVGA> . You are receiving this because you were mentioned.> Message ID: > <cltk/lat_text_perseus/issues/2/2354317350> @> github> .> com>
Currently, the JSON files are not correctly parsed.
For example,
minfel.octav_lat.json
'stext
values are null. Some corrected words are mixed in the output JSON indistinguishably:I'd like to help rewrite the parsing script (
xml_to_json.py
) to make it work properly, can you add me as a collaborator?