It would be very useful to have a transformation that extracts any tables from PAGE-XML to CSV.
@bertsky:
Thoughts:
each TableRegion needs its own CSV, so it's not immediately clear how this fits with the page→page converter paradigm
(e.g. for page→text, one could simply paste the CSV in the middle of the plaintext, but maybe creating a multitude of output files is usually better)
CSV may already be too coarse (no multi-span, no header distinction)
perhaps better transfer to ocr-fileformat subrepo?
From https://github.com/OCR-D/ocrd_fileformat/issues/46
@kba:
@bertsky: