Failed to load OCR model on Windows Prebuilt

arihid commented 5 months ago

This error is always shows up whenever I run the prebuilt Windows binary. I have downloaded the ocr model and copied it to the cache location.

Here's the log:

- Program Information -
Program: Panel Cleaner 2.3.0
Executing from C:\Users\<username>\AppData\Local\Temp\_MEI154802\pcleaner\gui\launcher.pyc
Log file is C:\Users\<username>\AppData\Roaming\pcleaner\cache\pcleaner.log
Config file is C:\Users\<username>\AppData\Roaming\pcleaner\pcleanerconfig.ini
Cache directory is C:\Users\<username>\AppData\Roaming\pcleaner\cache
- System Information -
Operating System: Windows 10
Machine: AMD64
Python Version: 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]
PySide (Qt) Version: 6.6.0
Available Qt Themes: windowsvista, Windows, Fusion
System locale: en_US
CPU Cores: 16
GPU: None (CUDA not available)

2024-03-01 15:27:40.095 | INFO     | pcleaner.gui.launcher:launch:73 - Using locale en_US.
2024-03-01 15:27:40.098 | DEBUG    | pcleaner.gui.launcher:launch:80 - Loaded built-in Qt translations for en_US.
2024-03-01 15:27:40.098 | DEBUG    | pcleaner.gui.launcher:launch:88 - Loaded built-in Qt base translations for en_US.
2024-03-01 15:27:40.099 | DEBUG    | pcleaner.gui.launcher:launch:98 - Loaded App translations for en_US.
2024-03-01 15:27:40.697 | DEBUG    | pcleaner.gui.mainwindow_driver:ensure_models_downloaded:318 - Text detector model already downloaded.
2024-03-01 15:27:40.697 | DEBUG    | pcleaner.gui.mainwindow_driver:start_initialization_worker:502 - Worker Thread cleaning cache
2024-03-01 15:27:40.697 | DEBUG    | pcleaner.gui.mainwindow_driver:start_initialization_worker:509 - Worker Thread loading OCR model.
2024-03-01 15:27:40.698 | DEBUG    | pcleaner.gui.mainwindow_driver:initialize_ui:185 - Purging missing profiles.
2024-03-01 15:27:40.698 | INFO     | pcleaner.gui.mainwindow_driver:initialize_profiles:735 - Found profiles: [('Default', None)]
2024-03-01 15:27:40.699 | DEBUG    | pcleaner.config:load_profile:961 - Loading profile None...
2024-03-01 15:27:40.699 | INFO     | manga_ocr.ocr:__init__:13 - Loading OCR model from kha-white/manga-ocr-base
2024-03-01 15:27:40.699 | DEBUG    | pcleaner.config:load_profile:968 - Loading builtin default profile
2024-03-01 15:27:40.722 | DEBUG    | pcleaner.gui.mainwindow_driver:load_current_profile:893 - Loading current profile.
2024-03-01 15:27:40.723 | DEBUG    | pcleaner.gui.profile_parser:set_profile_values:393 - Setting profile values
2024-03-01 15:27:40.729 | DEBUG    | pcleaner.gui.mainwindow_driver:initialize_analytics_view:568 - Loading included font from C:\Users\<username>\AppData\Local\Temp\_MEI154802\pcleaner\data\NotoMono-Regular.ttf
2024-03-01 15:27:40.730 | DEBUG    | pcleaner.gui.mainwindow_driver:initialize_analytics_view:571 - Loaded included font
2024-03-01 15:27:40.732 | DEBUG    | pcleaner.gui.mainwindow_driver:save_default_palette:125 - Placeholder color: #000000
2024-03-01 15:27:40.733 | INFO     | pcleaner.gui.mainwindow_driver:set_theme:147 - Using theme: breeze-dark
2024-03-01 15:27:40.765 | INFO     | pcleaner.gui.mainwindow_driver:set_theme:159 - Theme is dark: True
2024-03-01 15:27:40.945 | DEBUG    | pcleaner.gui.mainwindow_driver:post_init:367 - Char width: 6, columns: 74, required width: 444
2024-03-01 15:27:40.947 | DEBUG    | pcleaner.gui.mainwindow_driver:post_init:398 - Splitter sizes: [400, 918, 460]
2024-03-01 15:27:51.073 | CRITICAL | pcleaner.gui.mainwindow_driver:generic_worker_error:536 - Failed to load OCR model. OCR impossible, moderate cleaning impact.

Encountered error:
Traceback (most recent call last):

  File "urllib3\connectionpool.py", line 791, in urlopen

  File "urllib3\connectionpool.py", line 492, in _make_request

  File "urllib3\connectionpool.py", line 468, in _make_request

  File "urllib3\connectionpool.py", line 1097, in _validate_conn

  File "urllib3\connection.py", line 642, in connect

  File "urllib3\connection.py", line 783, in _ssl_wrap_socket_and_match_hostname

  File "urllib3\util\ssl_.py", line 471, in ssl_wrap_socket

  File "urllib3\util\ssl_.py", line 515, in _ssl_wrap_socket_impl

  File "ssl.py", line 517, in wrap_socket

  File "ssl.py", line 1075, in _create

  File "ssl.py", line 1346, in do_handshake

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "requests\adapters.py", line 486, in send

  File "urllib3\connectionpool.py", line 845, in urlopen

  File "urllib3\util\retry.py", line 470, in increment

  File "urllib3\util\util.py", line 38, in reraise

  File "urllib3\connectionpool.py", line 791, in urlopen

  File "urllib3\connectionpool.py", line 492, in _make_request

  File "urllib3\connectionpool.py", line 468, in _make_request

  File "urllib3\connectionpool.py", line 1097, in _validate_conn

  File "urllib3\connection.py", line 642, in connect

  File "urllib3\connection.py", line 783, in _ssl_wrap_socket_and_match_hostname

  File "urllib3\util\ssl_.py", line 471, in ssl_wrap_socket

  File "urllib3\util\ssl_.py", line 515, in _ssl_wrap_socket_impl

  File "ssl.py", line 517, in wrap_socket

  File "ssl.py", line 1075, in _create

  File "ssl.py", line 1346, in do_handshake

urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "huggingface_hub\file_download.py", line 1232, in hf_hub_download

  File "huggingface_hub\utils\_validators.py", line 118, in _inner_fn

  File "huggingface_hub\file_download.py", line 1599, in get_hf_file_metadata

  File "huggingface_hub\file_download.py", line 417, in _request_wrapper

  File "huggingface_hub\file_download.py", line 452, in _request_wrapper

  File "huggingface_hub\utils\_http.py", line 258, in http_backoff

  File "requests\sessions.py", line 589, in request

  File "requests\sessions.py", line 703, in send

  File "huggingface_hub\utils\_http.py", line 63, in send

  File "requests\adapters.py", line 501, in send

requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)), '(Request ID: 42b065a9-e571-4bfe-8539-7706262c7eda)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "transformers\utils\hub.py", line 430, in cached_file

  File "huggingface_hub\utils\_validators.py", line 118, in _inner_fn

  File "huggingface_hub\file_download.py", line 1349, in hf_hub_download

huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

> File "pcleaner\gui\worker_thread.py", line 141, in run

  File "pcleaner\gui\mainwindow_driver.py", line 523, in load_ocr_model

  File "manga_ocr\ocr.py", line 14, in __init__

  File "transformers\models\auto\feature_extraction_auto.py", line 339, in from_pretrained

  File "transformers\feature_extraction_utils.py", line 498, in get_feature_extractor_dict

  File "transformers\utils\hub.py", line 470, in cached_file

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like kha-white/manga-ocr-base is not the path to a directory containing a file named preprocessor_config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

VoxelCubes commented 5 months ago

So, you don't have internet, or at least don't let the app connect to the internet, or huggingface is currently experiencing an outtage, fair enough, so instead decided to download the models yourself, correct?

Question is, where did you put the model (because for OCR this is a whole directory of like 5 files)? And did you perhaps not unzip it if you downloaded that as a bundle? You can see the correct place to download the model to in the download greeter, the very first thing that opens when you don't have the models installed. (Or to make it appear again, go to the help menu and select to delete the models, or even just delete your config file) Tip: it isn't the "models" folder used by the comictextdetector.

From the log (thank you very much for including that, it's helped a lot) I can see that the model downloader thinks you have OCR installed, but it then fails to load it. I'd be curious to know what it is finding there, since it checks for the directory huggingface/hub/models--kha-white--manga-ocr-base which will default to C:\Users\<username>\.cache\huggingface\hub\models--kha-white--manga-ocr-base on Windows (unless you change the location of the .cache directory with the environment variable XDG_CACHE_HOME.

Yes, huggingface does things in it's own special way, can't do too much about that.

So, assuming your folder structure looks like this:

huggingface
└── hub
   ├── models--kha-white--manga-ocr-base
   │  ├── refs
   │  │  └── main
   │  └── snapshots
   │     └── aa6573bd10b0d446cbf622e29c3e084914df9741
   │        ├── config.json
   │        ├── preprocessor_config.json
   │        ├── pytorch_model.bin
   │        ├── special_tokens_map.json
   │        ├── tokenizer_config.json
   │        └── vocab.txt
   └── version.txt

with the files in snapshot being downloaded from https://huggingface.co/kha-white/manga-ocr-base/tree/main (with commit hash aa6573bd10b0d446cbf622e29c3e084914df9741). The version.txt contains this:

And the main file contains this:

aa6573bd10b0d446cbf622e29c3e084914df9741

With only that, and no internet, I successfully got panel cleaner to load the model. Without the main file, I got the error you did.

So, if you replicate that structure, it'll work for you, I hope. It's probably just the main file you were missing, which yes, is poorly documented, I just had to figure it out myself.

If this works for you, I'll consider making a bit of documentation for this process.

arihid commented 5 months ago

Thanks, it now run no prob.

VoxelCubes commented 5 months ago

Cool and good!

VoxelCubes / PanelCleaner

Failed to load OCR model on Windows Prebuilt #72