Table extraction - Githubissues

From https://github.com/OCR-D/ocrd_fileformat/issues/46

@kba:

It would be very useful to have a transformation that extracts any tables from PAGE-XML to CSV.

@bertsky:

Thoughts:

each TableRegion needs its own CSV, so it's not immediately clear how this fits with the page→page converter paradigm (e.g. for page→text, one could simply paste the CSV in the middle of the plaintext, but maybe creating a multitude of output files is usually better)

CSV may already be too coarse (no multi-span, no header distinction)

perhaps better transfer to ocr-fileformat subrepo?

UB-Mannheim / ocr-fileformat

Table extraction #164