dynobo / normcap

OCR powered screen-capture tool to capture information instead of images
1.8k stars 91 forks source link

Arabic instead of English #593

Open faveoled opened 5 months ago

faveoled commented 5 months ago

What happened?

All English text gets recognized into some Arabic symbols. Latest AppImage

How did you install NormCap?

AppImage (Linux)

Operating System + Version?

Ubuntu 22.04.3

[Linux only] Display Server (DS) + Desktop environment (DE)?


Debug log output?*

13:00:38 - INFO    - normcap:49 - Start NormCap v0.5.4
13:00:38 - DEBUG   - normcap:107 - Append /tmp/.mount_NormCaZT6zGv/usr/bin to AppImage internal PATH
13:00:38 - DEBUG   - normcap.gui.tray:77 - System info:
{'normcap_version': '0.5.4', 'python_version': '3.10.13', 'cli_args': '/tmp/.mount_NormCaZT6zGv/usr/app/normcap/__main__.py -v debug', 'is_briefcase_package': True, 'is_flatpak_package': False, 'is_appimage_package': True, 'platform': 'linux', 'desktop_environment': <DesktopEnvironment.GNOME: 1>, 'display_manager_is_wayland': False, 'pyside6_version': '6.6.1', 'qt_version': '6.6.1', 'qt_library_path': '/tmp/.mount_NormCaZT6zGv/usr/app_packages/PySide6/Qt/plugins, /tmp/.mount_NormCaZT6zGv/usr/python/bin', 'locale': 'DEFAULT', 'config_directory': PosixPath('/home/user/.config/normcap'), 'resources_path': PosixPath('/tmp/.mount_NormCaZT6zGv/usr/app/normcap/resources'), 'tesseract_path': PosixPath('/tmp/.mount_NormCaZT6zGv/usr/bin/tesseract'), 'tessdata_path': PosixPath('/home/user/.config/normcap/tessdata'), 'envs': {'TESSDATA_PREFIX': None, 'LD_LIBRARY_PATH': None}, 'screens': [Screen(left=0, top=0, right=1365, bottom=767, device_pixel_ratio=1.0, index=0, screenshot=None)]}
13:00:38 - DEBUG   - normcap.gui.settings:162 - Skip update of non existing setting (show_introduction: None)
13:00:38 - DEBUG   - normcap.gui.settings:162 - Skip update of non existing setting (cli_mode: False)
13:00:38 - DEBUG   - normcap.gui.settings:162 - Skip update of non existing setting (background_mode: False)
13:00:38 - DEBUG   - normcap.gui.settings:162 - Skip update of non existing setting (clipboard_handler: None)
13:00:38 - DEBUG   - normcap.gui.tray:388 - Listen on local socket v0.5.4-normcap.
13:00:38 - DEBUG   - normcap.screengrab.main:20 - Select capture method QT
13:00:38 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/2024-01-18_10-00-38_raw_screen0.png
13:00:38 - DEBUG   - normcap.gui.window:52 - Create window for screen 0
13:00:38 - DEBUG   - normcap.gui.window:128 - Set window of screen 0 to fullscreen
13:00:38 - DEBUG   - normcap:183 - [QT] qtwarningmsg - qsystemtrayicon::setvisible: no icon set
13:00:38 - DEBUG   - normcap.ocr.tesseract:24 - Executing '/tmp/.mount_NormCaZT6zGv/usr/bin/tesseract --list-langs --tessdata-dir /home/user/.config/normcap/tessdata'
13:00:38 - DEBUG   - normcap.ocr.tesseract:37 - Tesseract command output: List of available languages in "/home/user/.config/normcap/tessdata/" (6): ¬ ara ¬ chi_sim ¬ deu ¬ eng ¬ rus ¬ spa ¬
13:00:44 - DEBUG   - normcap.gui.tray:354 - Hide 1 window
13:00:44 - INFO    - normcap.gui.tray:246 - Crop image to region (375, 428, 666, 463)
13:00:44 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/2024-01-18_10-00-44_cropped.png
13:00:44 - DEBUG   - normcap.gui.tray:271 - Start OCR
13:00:44 - DEBUG   - normcap.ocr.enhance:84 - Scale image x2
13:00:44 - DEBUG   - normcap.ocr.enhance:57 - Pad image by 80px
13:00:44 - DEBUG   - normcap.ocr.recognize:35 - Run Tesseract on image of size (744, 232) with args:
TessArgs(tessdata_path=PosixPath('/home/user/.config/normcap/tessdata'), lang='ara', oem=<OEM.DEFAULT: 3>, psm=<PSM.AUTO: 3>)
13:00:44 - DEBUG   - normcap.ocr.tesseract:24 - Executing '/tmp/.mount_NormCaZT6zGv/usr/bin/tesseract /tmp/tmphgsz7_tg/normcap_tesseract_input.png /tmp/tmphgsz7_tg/normcap_tesseract_input.png -c tessedit_create_tsv=1 -l ara --oem 3 --psm 3 --tessdata-dir /home/user/.config/normcap/tessdata -c tessedit_write_images=1 -c tessedit_dump_pageseg_images=1'
13:00:44 - DEBUG   - normcap.ocr.tesseract:37 - Tesseract command output: 
13:00:44 - DEBUG   - normcap.ocr.tesseract:67 - Skip moving file to temp dir, it does not exist: /tmp/tmphgsz7_tg/normcap_tesseract_input.png.png_debug.pdf
13:00:44 - DEBUG   - normcap.ocr.recognize:44 - OCR result:
OcrResult(tess_args=TessArgs(tessdata_path=PosixPath('/home/user/.config/normcap/tessdata'), lang='ara', oem=<OEM.DEFAULT: 3>, psm=<PSM.AUTO: 3>), words=[{'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 1, 'left': 499, 'top': 96, 'width': 152, 'height': 26, 'conf': 4.121132, 'text': 'ع1939ممة.'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 2, 'left': 356, 'top': 96, 'width': 132, 'height': 27, 'conf': 41.292465, 'text': '86_64)-4'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 3, 'left': 337, 'top': 110, 'width': 3, 'height': 6, 'conf': 74.659691, 'text': '.'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 4, 'left': 318, 'top': 96, 'width': 8, 'height': 20, 'conf': 85.199936, 'text': '5'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 5, 'left': 299, 'top': 110, 'width': 5, 'height': 6, 'conf': 59.467552, 'text': '.'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 6, 'left': 117, 'top': 76, 'width': 170, 'height': 51, 'conf': 0.0, 'text': '6-موعمعهاة/'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 7, 'left': 99, 'top': 110, 'width': 7, 'height': 6, 'conf': 67.63858, 'text': '.'}], image=<PySide6.QtGui.QImage(QSize(744, 232),format=QImage::Format_RGB32,depth=32,devicePixelRatio=1,bytesPerLine=2976,sizeInBytes=690432) at 0x7f7a63936fc0>, magic_scores={}, parsed='')
13:00:44 - INFO    - normcap.ocr.magics.email_magic:60 - 0 emails found 
13:00:44 - DEBUG   - normcap.ocr.magics.email_magic:71 - 0/32 (0.0) chars in emails
13:00:44 - INFO    - normcap.ocr.magics.url_magic:57 - 0 URLs found 
13:00:44 - DEBUG   - normcap.ocr.magics.url_magic:65 - 0/38 (0.0) chars in urls
13:00:44 - DEBUG   - normcap.ocr.magic:82 - Magic scores:
{'SingleLineMagic': 50, 'MultiLineMagic': 0, 'ParagraphMagic': 0.0, 'EmailMagic': 0.0, 'UrlMagic': 0.0}
13:00:44 - DEBUG   - normcap.ocr.recognize:48 - Parsed text:
ع1939ممة. 86_64)-4 . 5 . 6-موعمعهاة/ .
13:00:44 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/2024-01-18_10-00-44_enhanced.png
13:00:44 - INFO    - normcap.gui.tray:289 - Text from OCR:
ع1939ممة. 86_64)-4 . 5 . 6-موعمعهاة/ .
13:00:44 - DEBUG   - normcap.gui.tray:332 - Copy text to clipboard
13:00:44 - DEBUG   - normcap.clipboard.handlers.windll:187 - WindllHandler is incompatible on non-Windows systems
13:00:44 - DEBUG   - normcap.clipboard.handlers.pbcopy:24 - PbCopyHandler is incompatible on non-macOS systems
13:00:44 - DEBUG   - normcap.clipboard.handlers.qtclipboard:34 - QtCopyHandler is compatible
13:00:44 - DEBUG   - normcap.clipboard.handlers.wlclipboard:34 - WlCopyHandler is not compatible on non-Linux systems and on Linux w/o Wayland
13:00:44 - DEBUG   - normcap.clipboard.handlers.xclip:38 - XclipCopyHandler is compatible
13:00:44 - DEBUG   - normcap.clipboard.main:84 - Compatible clipboard handlers: ['qt', 'xclip']
13:00:44 - DEBUG   - normcap.clipboard.handlers.qtclipboard:38 - QtCopyHandler requires no dependencies
13:00:44 - DEBUG   - normcap.clipboard.handlers.xclip:46 - XclipCopyHandler dependencies are installed (/tmp/.mount_NormCaZT6zGv/usr/bin/xclip)
13:00:44 - DEBUG   - normcap.clipboard.main:89 - Available clipboard handlers: ['qt', 'xclip']
13:00:44 - DEBUG   - normcap.clipboard.handlers.qtclipboard:34 - QtCopyHandler is compatible
13:00:44 - DEBUG   - normcap.clipboard.main:56 - Text copied to clipboard using 'qt.' handler
13:00:44 - DEBUG   - normcap.gui.notification:132 - Send notification via notify-send
13:00:49 - INFO    - normcap.gui.tray:610 - Exit normcap
13:00:49 - DEBUG   - normcap.gui.tray:611 - Debug images saved in /tmp/normcap
faveoled commented 5 months ago

Fixed by removing .config/normcap. Note that I didn't edit the files there ever. I used your app for some time in 0.3 version

dynobo commented 5 months ago

Yeah, sorry about that, this was a known issue for updating from 0.3 to 0.4 (or above). :see-no-evil:

372 explains the details.

If someone else reads this issue: Before deleting the config files, try toggling Arabic (ARA) on and off in the Settings menu, then toggle English (ENG) off and on. That should do the trick, too.