option to remove overlap duplication

OpenPecha / Toolkit

🛠 Tools to create, edit and export texts and annotations

https://toolkit.openpecha.org

Apache License 2.0

7 stars 4 forks source link

Closed eroux closed 1 year ago

eroux commented 1 year ago

there should be an option in the ocr import pipeline to detect and remove duplicates as in

This should be relatively (!) straightforward by looking at pixel coordinates within a line and either:

remove everything that has a big enough overlap (this is the best but more complex option)
remove everything that goes back horizontally

eroux commented 1 year ago

the intervention should be around here @ta4tsering do you want to give it a try?

ta4tsering commented 1 year ago

okay I will look into it