RajSolai / TextSnatcher

How to Copy Text from Images ? Answer is TextSnatcher !. Perform OCR operations in seconds on Linux Desktop.
https://textsnatcher.rf.gd/
GNU General Public License v3.0
1.26k stars 45 forks source link

Invisible Unicode character at end of all text #48

Open matt-laird opened 11 months ago

matt-laird commented 11 months ago

There seems to be a U+000c invisible Unicode character at the end of all generated text. This causes problems in some applications when pasting resulting text. See below example, problem on line 2: image

RajSolai commented 11 months ago

for a long time I have also faced this issue, is just a string trim fine ? so is there something with tesseract that I should configure any ideas ?

matt-laird commented 11 months ago

I had a brief look, it does seem to be an artifact from Tesseract's process, maybe give this a read and see if the different options help at all - Tesseract FAQ, unfortunately I can't test these myself right now.

RajSolai commented 11 months ago

I think we can trim the string for now I guess, thanks now I also got the Exact unicode to find and remove