Transkribus / TranskribusCore

Note: the repo has been moved to https://gitlab.com/readcoop/Transkribus/TranskribusCore
GNU General Public License v3.0
37 stars 5 forks source link

TEI Export: output might not be well-formed XML/valid TEI #26

Open kahlep opened 7 years ago

kahlep commented 7 years ago

For documents including overlapping tags, the TEI XML created by the export routine is not always well-formed and is thus not usable with TEI-conformant tools such as Voyant Tools.

An example can be found in Document 8393, page 141. The produced TEI XML for this page includes:

<l facs='#facs_141_line_1490048050651_115'>given <hi rend='underlined:true;'>by the King of Sweden to his Subjects <choice><expan></expan><abbr>Aug</hi><hi rend='underlined:true; superscript:true;'>t</hi><hi rend='underlined:true;'>.</abbr></choice> 21</hi><hi rend='underlined:true; superscript:true;'>st</hi><hi rend='underlined:true;'>.</hi> 1772</l>

Oxygen error message: The element type "abbr" must be terminated by the matching end-tag "".

There should also be an export option to exclude user-defined tags without a valid TEI equivalent, in order to produce valid TEI XML.