dynobo / normcap

OCR powered screen-capture tool to capture information instead of images
https://dynobo.github.io/normcap/
Other
1.8k stars 91 forks source link

NormCap AppImage Failing at OCR #556

Closed Lonniebiz closed 7 months ago

Lonniebiz commented 7 months ago

What happened?

Do video card drivers effect NormCap? I'm running Nvidia drivers on Debian 12, and when I try to OCR something with the NormCap 0.4.4 AppImage all the characters copied to the clipboard are nothing like what I captured.

I'm just capturing very plain looking text as a test, but I'm getting a clipboard full of text that's so strange that github cannot render it in a code block. I've attached that text as a file. It looks nothing like what I captured, and try copying and pasting it into github. So WEIRD! crazy.txt

How did you install NormCap?

AppImage

Operating System + Version?

Debian 12

[Linux only] Display Server (DS) + Desktop environment (DE)?

Awesome Window Manager with Nvidia Drivers

dynobo commented 7 months ago

Hi @Lonniebiz, thanks for reporting this issue! That's a very weird result, indeed! :see_no_evil:

Video card drivers should not affect NormCap at all. But your Awesome Window Manager might.

Before we look into that, could you please try the new NormCap 0.5.0beta1 first? There is a good chance, that the bug already is fixed.

If it still doesn't work, please start NormCap in a terminal with the debug flag:

./NormCap-0.5.0-beta1-x86_64.AppImage -v debug

Then perform the problematic capture, exit NormCap, and share the output from the terminal here. That would help a lot with diagnosis. :slightly_smiling_face:

Lonniebiz commented 7 months ago

When I tried running the beta AppImage you specified, the opportunity to capture never occurred. Here's the debug output:

▶ ./NormCap-0.5.0-beta1-x86_64.AppImage -v debug
03:47:26 - INFO    - normcap:49 - Start NormCap v0.5.0-beta1
/tmp/.mount_NormCab6wLzo/AppRun: line 11: 54511 Segmentation fault      "${APPDIR}/usr/python/bin/python3" -u -s -X utf8 -c "import runpy, sys; sys.path.pop(0); runpy.run_module('${BRIEFCASE_MAIN_MODULE}', run_name='__main__', alter_sys=True)" "$@"

The v0.4.4 did allow capture, but the captured text was like the weird stuff I reported earlier. Since the beta didn't give much output, here's a debug output from v0.4.4:

▶ ./NormCap-0.4.4-x86_64.AppImage -v debug
03:40:10 - INFO    - normcap:30 - Start NormCap v0.4.4
03:40:10 - DEBUG   - normcap.gui.tray:60 - System info:
{'cli_args': '/tmp/.mount_NormCaw4cSQV/usr/app/normcap/__main__.py -v debug', 'is_briefcase_package': True, 'is_flatpak_package': False, 'platform': 'linux', 'pyside6_version': '6.5.1', 'qt_version': '6.5.1', 'qt_library_path': '/tmp/.mount_NormCaw4cSQV/usr/app_packages/PySide6/Qt/plugins, /tmp/.mount_NormCaw4cSQV/usr/python/bin', 'config_directory': PosixPath('/home/user/.config/normcap'), 'normcap_version': '0.4.4', 'ressources_path': PosixPath('/tmp/.mount_NormCaw4cSQV/usr/app/normcap/resources'), 'tesseract_path': PosixPath('/tmp/.mount_NormCaw4cSQV/usr/bin/tesseract'), 'tessdata_path': PosixPath('/home/user/.config/normcap/tessdata'), 'envs': {'TESSDATA_PREFIX': None, 'LD_LIBRARY_PATH': None}, 'desktop_environment': <DesktopEnvironment.OTHER: 0>, 'display_manager_is_wayland': False, 'screens': [Screen(is_primary=True, device_pixel_ratio=2.2916666666666665, rect=Rect(left=3840, top=0, right=5516, bottom=943), index=0, screenshot=None), Screen(is_primary=False, device_pixel_ratio=2.2916666666666665, rect=Rect(left=0, top=0, right=1676, bottom=943), index=1, screenshot=None), Screen(is_primary=False, device_pixel_ratio=2.2916666666666665, rect=Rect(left=7680, top=0, right=9356, bottom=943), index=2, screenshot=None)]}
03:40:10 - DEBUG   - normcap.gui.tray:342 - Listen on local socket v0.4.4-normcap.
03:40:10 - DEBUG   - normcap.gui.settings:128 - Skip update of non existing setting (cli_mode: False)
03:40:10 - DEBUG   - normcap.gui.settings:128 - Skip update of non existing setting (background_mode: False)
03:40:10 - DEBUG   - normcap.screengrab:32 - Select capture method QT
03:40:11 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1701164411.026825_raw_screen0.png
03:40:11 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1701164411.3179169_raw_screen1.png
03:40:11 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1701164411.5614607_raw_screen2.png
03:40:11 - DEBUG   - normcap.gui.window:131 - Create window for screen 0
03:40:11 - DEBUG   - normcap.gui.window:193 - Set window of screen 0 to fullscreen
03:40:11 - DEBUG   - normcap.gui.window:131 - Create window for screen 1
03:40:11 - DEBUG   - normcap.gui.window:193 - Set window of screen 1 to fullscreen
03:40:11 - DEBUG   - normcap.gui.window:131 - Create window for screen 2
03:40:11 - DEBUG   - normcap.gui.window:193 - Set window of screen 2 to fullscreen
03:40:11 - DEBUG   - normcap:213 - [QT] qtwarningmsg - qsystemtrayicon::setvisible: no icon set
03:40:12 - DEBUG   - normcap.ocr.tesseract:23 - Tesseract command output:
List of available languages in "/home/user/.config/normcap/tessdata/" (6):
ara
chi_sim
deu
eng
rus
spa
03:40:12 - DEBUG   - normcap.gui.update_check:113 - Search for new version on https://github.com/dynobo/normcap/releases.atom
03:40:12 - DEBUG   - normcap.gui.downloader:62 - Download https://github.com/dynobo/normcap/releases.atom
03:40:12 - DEBUG   - normcap.gui.downloader:33 - Fallback to ssl without verification
03:40:13 - DEBUG   - normcap.gui.update_check:50 - Newest version: 0.4.4 (installed: 0.4.4)
03:40:19 - DEBUG   - normcap.gui.tray:294 - Hide 3 windows
03:40:19 - INFO    - normcap.gui.tray:197 - Crop image to region (769, 1743, 1532, 1805)
03:40:19 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1701164419.0116553_cropped.png
03:40:19 - DEBUG   - normcap.gui.tray:222 - Start OCR
03:40:19 - DEBUG   - normcap.ocr.enhance:76 - Scale image x3.2
03:40:19 - DEBUG   - normcap.ocr.enhance:54 - Pad image by 80px
03:40:19 - DEBUG   - normcap.ocr.enhance:92 - Invert image
03:40:19 - DEBUG   - normcap.ocr.recognize:35 - Run Tesseract on image of size (2596, 358) with args:
TessArgs(tessdata_path=PosixPath('/home/user/.config/normcap/tessdata'), lang='ara', oem=<OEM.DEFAULT: 3>, psm=<PSM.AUTO_OSD: 1>)
03:40:19 - DEBUG   - normcap.ocr.tesseract:23 - Tesseract command output:

03:40:19 - DEBUG   - normcap.ocr.recognize:44 - OCR result:
OcrResult(tess_args=TessArgs(tessdata_path=PosixPath('/home/user/.config/normcap/tessdata'), lang='ara', oem=<OEM.DEFAULT: 3>, psm=<PSM.AUTO_OSD: 1>), words=[{'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 1, 'left': 102, 'top': 88, 'width': 1019, 'height': 78, 'conf': 57.136185, 'text': '001101030'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 2, 'left': 479, 'top': 84, 'width': 28, 'height': 86, 'conf': 88.050911, 'text': '.'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 3, 'left': 538, 'top': 84, 'width': 189, 'height': 86, 'conf': 78.18235, 'text': '01.'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 4, 'left': 775, 'top': 84, 'width': 103, 'height': 86, 'conf': 41.704292, 'text': '2'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 5, 'left': 968, 'top': 84, 'width': 39, 'height': 86, 'conf': 85.134743, 'text': ':'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 6, 'left': 1054, 'top': 84, 'width': 71, 'height': 86, 'conf': 81.012039, 'text': '2'}], image=<PySide6.QtGui.QImage(QSize(2596, 358),format=QImage::Format_RGB32,depth=32,devicePixelRatio=1,bytesPerLine=10384,sizeInBytes=3717472) at 0x7f7396628700>, magic_scores={}, parsed='')
03:40:19 - INFO    - normcap.ocr.magics.email_magic:33 - 0 emails found 
03:40:19 - DEBUG   - normcap.ocr.magics.email_magic:41 - 0/16 (0.0) chars in emails
03:40:19 - INFO    - normcap.ocr.magics.url_magic:55 - 0 URLs found 
03:40:19 - DEBUG   - normcap.ocr.magics.url_magic:63 - 0/21 (0.0) chars in urls
03:40:19 - DEBUG   - normcap.ocr.magics.magic:70 - Magic scores:
{'SingleLineMagic': 50, 'MultiLineMagic': 0, 'ParagraphMagic': 0.0, 'EmailMagic': 0.0, 'UrlMagic': 0.0}
03:40:19 - DEBUG   - normcap.ocr.recognize:48 - Parsed text:
001101030 . 01. 2 : 2
03:40:19 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1701164419.192384_enhanced.png
03:40:19 - INFO    - normcap.gui.tray:240 - Text from OCR:
001101030 . 01. 2 : 2
03:40:19 - DEBUG   - normcap.clipboard.linux:47 - Select clipboard method QT
03:40:19 - DEBUG   - normcap.gui.tray:270 - Copy text to clipboard
03:40:19 - DEBUG   - normcap.gui.notifier:111 - Send notification via QT
03:40:24 - INFO    - normcap.gui.tray:506 - Exit normcap (notification sent delaying exit)
03:40:24 - DEBUG   - normcap.gui.tray:507 - Debug images saved in /tmp/normcap

Additional Information:

NormCap-0.3.9-x86_64.AppImage was doing successful captures prior to upgrading to NormCap-0.4.4-x86_64.AppImage. However, I just tried NormCap-0.3.9-x86_64.AppImage again, just to see it it still works, and I'm sad to report that it no longer works either.

So, what changed? Well, the thing that recently changed was that I installed the nvidia-driver like this:

Install the Nvidia drivers on Debian 12:
  1.) Open this file:
    sudo nano /etc/apt/sources.list

  2.) Make sure this is on the end of the first two lines in that file:
    main contrib non-free non-free-firmware

  3.) sudo apt update ; sudo apt install nvidia-driver firmware-misc-nonfree

  4.) reboot

I had an Nvidia graphics card all along, but previously I just settled for the open source video card drivers that Debian 12 provides by the default. Those default drivers work, but they're a lot slower.

Another application, that I had trouble with after I installed the Nvidia drivers, was Pulsar. In that case, I was able to work around the issue by passing an argument that made Pulsar do everything in the main thread. That ticket is here.

This additional information maybe unrelated, but given that 0.3.9 worked (prior to me installing the nvidia-driver), I figured I should mention it.

Lonniebiz commented 7 months ago

From reading this, my attention was directed to the fact that the debug output shows that Tesseract is set to recognize Arabic ('lang': 'ara').

So, I took a look at the NormCap options, like this: ▶./NormCap-0.4.4-x86_64.AppImage --help

I noticed the option -l, for language:

  -h, --help            show this help message and exit
  -c COLOR, --color COLOR
                        Set primary color for UI, e.g. '#FF2E88'
  -l LANGUAGE [LANGUAGE ...], --language LANGUAGE [LANGUAGE ...]
                        Set language(s) for text recognition, e.g. '-l eng' or '-l eng deu'
  -m {raw,parse}, --mode {raw,parse}
                        Set capture mode
  -n {True,False}, --notification {True,False}
                        Disable or enable notification after ocr detection
  -t {True,False}, --tray {True,False}
                        Disable or enable system tray
  -u {True,False}, --update {True,False}
                        Disable or enable check for updates
  -r, --reset           Reset all settings to default values
  -v {error,warning,info,debug}, --verbosity {error,warning,info,debug}
                        Set level of detail for console output (default: warning)
  --version             Print NormCap version and exit
  --cli-mode            Print text after detection to stdout and exits immediately
  --background-mode     Start minimized to tray, without capturing

When I run the AppImage with the -l eng argument, its working! ./NormCap-0.4.4-x86_64.AppImage -l eng

I guess this explains why the captures were so weird before; Engilsh isn't Arabic.

In the past, I've never had run NormCap with any options, but now I know to do this going forward.

Thanks for help!

dynobo commented 7 months ago

Oh, so you recently updated to 0.4.4? Actually, I had https://github.com/dynobo/normcap/issues/372 pinned for 10 months, which explains exactly this issue. I unpinned it just recently, with the hope that after that time most people switched to the new version... :see_no_evil:

Sorry for the inconvenience, I'm glad you figured it out on your own! (Nice detective skills, btw! :detective:)

Lonniebiz commented 7 months ago

Yes, I recently downloaded the 0.4.4 AppImage from here thinking it was the latest. I guess it has been a long while since I checked for a new version. Is 0.4.4 still the latest stable?

dynobo commented 7 months ago

Is 0.4.4 still the latest stable?

Yes, it is. The upcoming 0.5.0 stable release still requires some fixes, which are harder than I thought.

However, if anyone experiences issues with 0.4.4, I usually recommend to already try 0.5.0-beta1, because the remaining known issues are limited to certain distribution & display setups.

dynobo commented 7 months ago

When I tried running the beta AppImage you specified, the opportunity to capture never occurred. Here's the debug output:

▶ ./NormCap-0.5.0-beta1-x86_64.AppImage -v debug
03:47:26 - INFO    - normcap:49 - Start NormCap v0.5.0-beta1
/tmp/.mount_NormCab6wLzo/AppRun: line 11: 54511 Segmentation fault      "${APPDIR}/usr/python/bin/python3" -u -s -X utf8 -c "import runpy, sys; sys.path.pop(0); runpy.run_module('${BRIEFCASE_MAIN_MODULE}', run_name='__main__', alter_sys=True)" "$@"

For reference: this is a known issue and tracked in #555

dynobo commented 7 months ago

Closing, as it was resolved by re-selecting the proper language (after it got set to "ara" due to #372 .