Closed eroux closed 1 year ago
there should be an option in the ocr import pipeline to detect and remove duplicates as in
https://github.com/OpenPecha/OCR-Pipelines/issues/8
This should be relatively (!) straightforward by looking at pixel coordinates within a line and either:
the intervention should be around here @ta4tsering do you want to give it a try?
okay I will look into it
there should be an option in the ocr import pipeline to detect and remove duplicates as in
https://github.com/OpenPecha/OCR-Pipelines/issues/8
This should be relatively (!) straightforward by looking at pixel coordinates within a line and either: