Exmaralda-Org / exmaralda

26 stars 15 forks source link

ISO/TEI Conversion: Test roundtripping #367

Open berndmoos opened 1 year ago

berndmoos commented 1 year ago
berndmoos commented 1 year ago

Started with a very ugly HIAT example in which utterances would end within events, overlaps were not fully specified etc. Round-tripping was very messy and effectively unusable for this example. Changed some errors/oversights in the existing XSL chain and added additional transformations, namely:

Roundtripping now seems to work for the ugly HIAT example, and it also (still) works for the more well-behaved HIAT examples: Beckhams, AnneWill and one HAMATAC transcript (Stella/Tansu). Tier order is not maintained in round-tripping because it can't be. Next: Test round-tripping for the easier cases

berndmoos commented 1 year ago

Roundtripping also seems to work (one test file each) for:

There will still be issues, probably, especially in non-timed segments and atomic timed-segments, but I can't seem to find any right now. Next step could be to make sure that all ISO/TEI exports conform to some kind of schema, and maybe also make sure that intermediate steps do so. See #369.

berndmoos commented 1 year ago

Let's close it until furher notice.

berndmoos commented 1 year ago

Further notice: Removing unnecessary timepoints is either extremely slow or it does something weird leading to no error messages, but also to the import being stuck on that step for at least 5 minutes (for a very large transcript, but so what)

berndmoos commented 1 year ago

It is just very slow: 24 minutes for a transcription with 8 tiers x 2250 events (FLK without tokenisation/token annotation). The imported EXB is 25 tiers x 7000 events.

berndmoos commented 1 year ago

Fixed an error in one of the XSLs, replaced one XSL step (remove timepoints) with Java code operating on the DOM. Also added a final "remove unused timeline items" step. It is still slow, but instead of 24 it now takes 2 minutes. Importing large, token-annotated ISO/TEI files into the Partitur-Editor is probably not the brightest of all ideas.