INL / OpenConvert

Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
http://openconvert.clarin.inl.nl/
23 stars 9 forks source link

Issue with duplicate id's after conversion to FoLiA #4

Open JessedeDoes opened 5 years ago

proycon commented 5 years ago

To solve this with little effort, you might perhaps want to use the tei2folia stylesheet I recently developed (derived from your earlier work): https://github.com/proycon/foliatools/blob/master/foliatools/tei2folia.xsl

You could merge that one back into OpenConvert, although in my tei2folia tool I do rely on some extra post-processing using the FoLiA python library. I also downgraded the stylesheet to XSLT1.0 instead of 2.0 as I'm still tied to some older implementations (libxml2). There may also be some INT-specific things which got removed. On the bright side though, this stylesheet produces valid FoLiA v2 and was tested on the TEI file samples you send me a while back, as well as the whole of Nederlab's DBNL.