Crivella / ocr_extension

Firefox extension for inplace translation of images using ocr_translate
https://addons.mozilla.org/en-US/firefox/addon/ocr_extension/
GNU General Public License v3.0
16 stars 1 forks source link

Unidic Lite #4

Closed greenlime03 closed 1 day ago

greenlime03 commented 2 weeks ago

please teach me how to install Unidic Lite i have already download Unidic Lite, mecab3, and fugashi try to run the setup.py but still isn't installed

reading phyton installation step make me even more confused

nb. i try to use cuda. is it really taking so long to install torch?

Crivella commented 2 weeks ago

Could you point me to what steps are you taking to install this packages? EG what do you mean by run the setup.py.

In general when installing a python package you would want to use it's package manager pip. You can either pip install PACKAGE_NAME to dowload it from a repository (by default pypi) or download the source with either a setup.py or a pyproject.toml inside and run pip install . from that folder.

Regarding this tool you should not have to install anything manually of the packages i provide unless you want to try something non-standard (all the dependencies should be taken care of by the new plugin manager which uses pip under the hood)

nb. i try to use cuda. is it really taking so long to install torch?

If you mean you are using the plugin manager to install either the huggingface or easyocr plugin it is possible it will take a while (pytorch with its cuda deps can be 3~4GB of stuff). You can check the window where the server is running to see what is being installed (If the logging is not enough you can always change the log level at this line from INFO to DEBUG

greenlime03 commented 2 weeks ago

this is the running result

2024-08-27 19:40:49,555 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 64
2024-08-27 19:40:49,555 -    INFO -   django.server:basehttp        - "GET /get_plugin_data/ HTTP/1.1" 200 2029
2024-08-27 19:40:49,560 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6024
2024-08-27 19:40:57,670 -    INFO -     ocr.general:views           - SET LANG: ja, en
2024-08-27 19:40:57,674 -    INFO -   django.server:basehttp        - "POST /set_lang/ HTTP/1.1" 200 2
2024-08-27 19:40:57,687 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 64
2024-08-27 19:40:57,691 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6260
2024-08-27 19:41:02,954 -    INFO -     ocr.general:plugin_manager  - Installing plugin ocr_translate-hugging_face==0.3.0 with dependencies
2024-08-27 19:41:02,959 -    INFO -     ocr.general:plugin_manager  - Installing plugin ocr_translate-easyocr==0.4.1 with dependencies
2024-08-27 19:41:02,962 -    INFO -     ocr.general:plugin_manager  - Installing plugin ocr_translate-tesseract==0.3.0 with dependencies
2024-08-27 19:41:02,965 -    INFO -     ocr.general:plugin_manager  - Installing plugin ocr_translate-paddle==0.2.2 with dependencies
2024-08-27 19:41:02,970 -    INFO -     ocr.general:plugin_manager  - Installing plugin ocr_translate_ollama==0.1.4 with dependencies
2024-08-27 19:41:02,973 -    INFO -     ocr.general:plugin_manager  - Installing plugin ocr_translate-google==0.2.1 with dependencies
2024-08-27 19:41:03,036 -    INFO -   django.server:basehttp        - "POST /manage_plugins/ HTTP/1.1" 200 2
2024-08-27 19:41:03,044 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 64
2024-08-27 19:41:03,047 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6260
2024-08-27 19:41:07,772 -    INFO -     ocr.general:views           - LOAD MODELS: easyocr, kha-white/manga-ocr-base, staka/fugumt-ja-en
2024-08-27 19:41:07,772 -    INFO -     ocr.general:box             - Loading BOX model: easyocr
2024-08-27 19:41:18,743 -    INFO -          plugin:plugin          - Loading BOX model: easyocr
Using CPU. Note: This module is much faster with a GPU.
2024-08-27 19:41:20,026 -    INFO -     ocr.general:ocr             - Loading OCR model: kha-white/manga-ocr-base
2024-08-27 19:41:20,743 -    INFO -          plugin:ved             - Loading OCR VED model: kha-white/manga-ocr-base
2024-08-27 19:41:24,750 -   ERROR -     ocr.general:views           - Failed to load models: The unidic_lite dictionary is not installed. See https://github.com/polm/unidic-lite for installation.
2024-08-27 19:41:24,751 - WARNING -   django.server:basehttp        - "POST /set_models/ HTTP/1.1" 400 115
2024-08-27 19:41:24,760 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 1368
2024-08-27 19:41:24,760 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6267

there is error about unidic lite. in the download file from unidic lite, there is setup.py. i though running that will result in installing it with python but it still didn't get installed. the same with mecab3 and fugashi.

this is the result i try

>>> pip install unidic-lite-1.0.8.tar.gz
  File "<stdin>", line 1
    pip install unidic-lite-1.0.8.tar.gz
        ^^^^^^^
SyntaxError: invalid syntax

and this

>>> pip install unidic-lite
  File "<stdin>", line 1
    pip install unidic-lite
        ^^^^^^^
SyntaxError: invalid syntax

i'm trying to use CUDA because of this, "Using CPU. Note: This module is much faster with a GPU." by changing "set DEVICE=cuda" but "installing torch" take to much time

Crivella commented 2 weeks ago

this is the result i try >>> pip install unidic-lite-1.0.8.tar.gz

pip is not a command within python but an executable (you should run it either from CMD or Powershell on windows, also never tried to install a .tar.gz directly but i think that it should work).

Problem is that if you are using the run-user.bat script just installing the package globally will not work as you are running the base dependencies from within a virtual environment. You would first need to activate the vritual environment inside the venv folder and than install the dependency. In Powershell you can do so by running the Activate.ps1 script inside venv/Scripts

All the plugins dependencies are installed by the plugin manager under %userprofile%/.ocr_translate/plugins (if not changed from the default) and you should find the folder with unidict_lite under generic if it was installed correctly.

Setting the device to CUDA should not help for this as it will only end up reinstalling the dependencies for torch with cuda.

One thing you can try is uninstall huggingface from the extension (deselect it and click submit) and then reinstall it)

Worst case you could try removing/renaming the plugins folder and plugins.json file under %userprofile%/.ocr_translate and try reinstalling the plugins (if you get to this point do so with DEBUG logging so if there is some error you can report it here and i will look into it)

greenlime03 commented 2 weeks ago

how can i run the virtual environment? i try running the Active.ps1 and active.bat but it immediately closed. i try to start it with power shell but it just closed immediately or open the .ps1 with notepad. although it already get installed in "Requirement already satisfied: unidic-lite in c:\users*****\appdata\local\programs\python\python311\lib\site-packages (1.0.8)". do i need to uninstall it first? and how?

update i can open the virtual environment now. by bypass ExecutionPolicy to unrestricted first (and change back to default later). but it get installed inside "v0.6.0\venv\Lib\site-packages" and not the ".ocr_translate". will try later with run-user.bat

update now it show Network Error when i try on the sample from ocr_translate and new warning about sacremoses. what is sacremoses? will it fix if i installed it?

2024-08-27 21:30:26,523 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 64
2024-08-27 21:30:26,523 -    INFO -   django.server:basehttp        - "GET /get_plugin_data/ HTTP/1.1" 200 2029
2024-08-27 21:30:26,527 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6024
2024-08-27 21:30:32,569 -    INFO -     ocr.general:views           - SET LANG: ja, en
2024-08-27 21:30:32,573 -    INFO -   django.server:basehttp        - "POST /set_lang/ HTTP/1.1" 200 2
2024-08-27 21:30:32,583 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 64
2024-08-27 21:30:32,586 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6260
2024-08-27 21:30:35,775 -    INFO -     ocr.general:views           - LOAD MODELS: easyocr, kha-white/manga-ocr-base, staka/fugumt-ja-en
2024-08-27 21:30:35,775 -    INFO -     ocr.general:box             - Loading BOX model: easyocr
2024-08-27 21:30:37,937 -    INFO -          plugin:plugin          - Loading BOX model: easyocr
Using CPU. Note: This module is much faster with a GPU.
2024-08-27 21:30:39,181 -    INFO -     ocr.general:ocr             - Loading OCR model: kha-white/manga-ocr-base
2024-08-27 21:30:39,442 -    INFO -          plugin:ved             - Loading OCR VED model: kha-white/manga-ocr-base
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
2024-08-27 21:30:42,193 -    INFO -     ocr.general:tsl             - Loading TSL model: staka/fugumt-ja-en
2024-08-27 21:30:42,206 -    INFO -          plugin:seq2seq         - Loading TSL model: staka/fugumt-ja-en
C:\Users/janua/.ocr_translate/plugins/generic\transformers\models\marian\tokenization_marian.py:194: UserWarning: Recommended: pip install sacremoses.
  warnings.warn("Recommended: pip install sacremoses.")
2024-08-27 21:30:44,100 -    INFO -   django.server:basehttp        - "POST /set_models/ HTTP/1.1" 200 2
2024-08-27 21:30:44,106 -    INFO -   django.server:basehttp        - "GET / HTTP/1.1" 200 6309
2024-08-27 21:30:44,109 -    INFO -   django.server:basehttp        - "GET /get_active_options/ HTTP/1.1" 200 3328
Crivella commented 2 weeks ago

That is another dependency of some models in huggingface.

The initial dependency Django, django-ocr_translate and a few others which are the minimun required to run the server get installed under venv. Everything else installed by the server itselves gets installed under .ocr_translate (it install each package without automatic dep resolution in a temporary path and copies the installed files under .ocr_translate and make sure python can find them afterward)

This is weird all of them should have been installed under .ocr_translate/plugins/generic when you first installed the huggingface plugin using the extension.

greenlime03 commented 2 weeks ago

will try reinstalling everything from 0 tomorrow. anyway, how do i uninstall unidic-lite? should i just delete it?

Crivella commented 2 weeks ago

For uninstalling stuff you can use pip uninstall You can do so from a normal shell to delete them from .../appdata/... or after activating the venv to delete them from .../venv/lib/...

For deleting the deps/plugins installed by the server you need to delete the plugins folder and plugins.json file under .ocr_translste

Crivella commented 1 day ago

Closing this as inactive, in case you still have not solved the problem feel free to re-open the issue