axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
Apache License 2.0
5.86k stars 311 forks source link

Support detecting text in vertical direction #645

Closed baohq1595 closed 1 year ago

baohq1595 commented 1 year ago

I am working with CJK languages, and the text is in vertical direction. I tried the tool but it cannot detect vertical text. This pdfminer in python can solve the problem, it would be great if Parsr can support this.

BinaryBrain commented 1 year ago

Despite Parsr using pdfminer, it would require a rewrite of the ReadingOrderDetectionModule to support vertical texts, and the algorithm would be much more complicated. We don't have the resources to do it in a near future.