Closed Moonbase59 closed 2 years ago
@ilius I saw those errors since the beginning. As files are created by PyGlossary, is it expected to have such errors?
What version of PyGlossary are you using? Try again the latest tag or main branch.
We are running the latest version. FTR those errors were always present, here is an example 2 months ago: https://github.com/BoboTiG/ebook-reader-dict/runs/4343135917?check_suite_focus=true
I just did not take them into account, too lazy :) But they may be important or minor, and then maybe the exception could be silent.
Please make sure ~/.cache/pyglossary/
exists, or try again with latest tag.
I see https://github.com/ilius/pyglossary/commit/ea1ddf6d58529f212b3a6bcf96394fe09490d145 đ :)
If the next version does not fix errors, I'll have a look and report any potential improvement/bug to the PyGlossary repo.
I see ilius/pyglossary@ea1ddf6 đ :)
That's not a bug fix. 4.4.1 should work too.
@ilius: True that pyglossary generates the GIFs? If yes, ever thought of generating 8-bit grayscale+alpha PNGs instead? They arenât much bigger but might provide cleaner output.
And would you know if thatâs supported by readers?
Creating ~/.cache/pyglossary/dict-de-de.df_res
(specific to that call: python -m wikidict de --convert
) does not silent errors. I'll dig deeper when I find time.
U-huh, got it! https://github.com/ilius/pyglossary/commit/ecf386b80aa24d34a8dc4f31c13b2eeb79260cd3 That was one of weirdest bug I ever encountered.
I can add an option to convert gif to png if you want.
GIFâPNG wouldnât help much, I think. One of the problems is that the GIF already has a white background, which looks odd in readers using a background color (like GoldenDict). Who/what creates the images in the first place?
As long as weâre generating an HTML dict, it might be even better to generate an SVG (for formulae; with a size), but Iâm rather unsure about SVG rendering support in dicts. Then again, a reader would typically use its HTML renderer for that, so we might be lucky.
Actually, on Kobo there is no background color. Here is an example: cercle unité. I checked with the dark mode enabled, and still no background displayed.
I think we are talking about 2 kinds of GIFs, ones generated by the current project (<math>
, <chem>
, and some hieroglyphs): https://github.com/BoboTiG/ebook-reader-dict/blob/794a7236d46fd91f57cd52c8fe428c635f695ae1/wikidict/utils.py#L488-L503
And ones created by PyGlossary.
The former is using embedded GIFs as <img src="data:image/gif;base64,..."/>'
. The later is taking that information and turns it into real GIFs.
Might be worth looking at how PyGlossary is creating those files, maybe is there something to tweak?
Ah, interesting. Screenshot from an actual reader? It looks way better than on my GoldenDict (which does use a yellowish background, and thus we get "white blocks"). Too bad my Tolinos support none of the formats we currently generate. Must give KOReader a spin, I guess.
Where do the project-generated GIFs come from? I really wonder if SVG could be done (for scaling on any device/device resolution).
Here is the same work using dark theme:
And yes, it is the real screenshot on the Kobo Libra H2O.
SVG would be a killer feature, indeed. Not sure about the support though.
Looks like PIL only handles raster-type images. We might be able to use 'LA', though (8-bit grayscale+alpha).
I havenât yet installed FR, can I use a fast command to get only "cercle unitĂ©" in a dict, for comparison?
I havenât yet installed FR, can I use a fast command to get only "cercle unitĂ©" in a dict, for comparison?
Of course:
mkdir test_wik
python -m wikidict fr --gen-dict='cercle unité' --output=test_wik
The resulting dict can then be found inside the test_wik
folder.
You can adapt the command to use a German word (or English one like graph
).
Just love this project for providing a well thought-out foundation! What steps go before? Download/parse/render?
This is a sigle step, we introduced it to help debugging such issues ;)
And since a couple hours ago
python -m wikidict fr --gen-dict='cercle unité' --output=test_wik --format=stardict
To get a stardict file, instead of kobo.
But it will use an already downloaded dump, right? 'cause Iâm just downloading FR :-)
No, it gets the wiki code directly from the web for this article only (or the articles if you pass a comma separated list of words, just like get-word btw)
Wow! Ok, let me abort and try. D/Ling FR can be done later then.
Oops:
matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --gen-dict='cercle unité' --output=test_wik --format=stardict
>>> Generated dict-fr-fr.df (4,595 bytes)
Traceback (most recent call last):
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/glossary.py", line 905, in _read
reader.open(filename)
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/plugins/ebook_kobo_dictfile.py", line 71, in open
TextGlossaryReader.open(self, filename)
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/text_reader.py", line 84, in open
self._open(filename)
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/text_reader.py", line 80, in _open
self.loadInfo()
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/text_reader.py", line 131, in loadInfo
self._pendingEntries.append(self.newEntry(word, defi))
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/text_reader.py", line 113, in newEntry
return self._glos.newEntry(
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/glossary.py", line 742, in newEntry
return Entry(
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/entry.py", line 285, in __init__
raise TypeError(f"invalid defi type {type(defi)}")
TypeError: invalid defi type <class 'tuple'>
Reading file 'test_wik/dict-fr-fr.df' failed.
>>> Generated dict-fr-fr.zip (22 bytes)
matthias@e6510:~/Projekte/ebook-reader-dict$
I hit the issue too, I am currently looking into it :watch:
The .df looks ok, but the zip is empty.
Looks like a pyglossary bug. Iâm using 4.4.1 and tried to convert the .df manually.
https://github.com/ilius/pyglossary/commit/e864fa4cd29bcba024dc10e6b93eda259c228449 Please try again with latest master.
Okayyy⊠Next dumb question: How would I install the latest master over my PIP3-installed pyglossary? So that is is globally available (in the path), and can also do --ui=gtk
?
sudo python3 setup.py install
or
python3 setup.py install --user
FTR I added a test case to reproduce the current error:
# from up-to-date master branch
$ python -m pytest tests/test_5_gen_dict.py -k cercle
It seems that DictFile is causing issues with PyGlossary:
@ cercle unité
: \sÉÊ.klâży.ni.te\ <i>m.</i>
<html><p>Des mots <i>cercle</i>, figure géométrique, et <i>unité</i>.</p><br />
<ol><li><i>(MathĂ©matiques)</i></li><ol style="list-style-type:lower-alpha"><li>On appelle cercle unitĂ© de <img style="height:100%;max-height:0.8em;width:auto;vertical-align:bottom" src=""/>, lâensemble des nombres complexes de module Ă©gal Ă 1 : <img style="height:100%;max-height:0.8em;width:auto;vertical-align:bottom" src=""/>. <br> Il apparait alors clairement que <img style="height:100%;max-height:0.8em;width:auto;vertical-align:bottom" src=""/>.</li><li>De mĂȘme, on appelle cercle unitĂ© de <img style="height:100%;max-height:0.8em;width:auto;vertical-align:bottom" src=""/>, lâensemble <img style="height:100%;max-height:0.8em;width:auto;vertical-align:bottom" src=""/>.</li></ol></ol>
Ok, did a:
pip3 uninstall pyglossary
git clone https://github.com/ilius/pyglossary.git
cd pyglossary
python3 setup.py install --user
Ran command above:
python3 -m wikidict fr --gen-dict='cercle unité' --output=test_wik --format=stardict
Output:
matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --gen-dict='cercle unité' --output=test_wik --format=stardict
>>> Generated dict-fr-fr.df (4,595 bytes)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/matthias/Projekte/ebook-reader-dict/wikidict/__main__.py", line 122, in <module>
sys.exit(main())
File "/home/matthias/Projekte/ebook-reader-dict/wikidict/__main__.py", line 101, in main
return gen_dict.main(
File "/home/matthias/Projekte/ebook-reader-dict/wikidict/gen_dict.py", line 25, in main
run_formatter(StarDictFormat, *args)
File "/home/matthias/Projekte/ebook-reader-dict/wikidict/convert.py", line 399, in run_formatter
formater.process()
File "/home/matthias/Projekte/ebook-reader-dict/wikidict/convert.py", line 356, in process
self._convert()
File "/home/matthias/Projekte/ebook-reader-dict/wikidict/convert.py", line 327, in _convert
Glossary.init()
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/glossary.py", line 1153, in init
cls.loadPluginsFromJson(pluginsJsonPath)
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/plugin_manager.py", line 53, in loadPluginsFromJson
with open(jsonPath) as _file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/matthias/.local/lib/python3.8/site-packages/plugins-meta/index.json'
Traceback locals:
cls = <class 'pyglossary.glossary.Glossary'>
jsonPath = '/home/matthias/.local/lib/python3.8/site-packages/plugins-met...
len(jsonPath) = 73
json = <module 'json' from '/usr/lib/python3.8/json/__init__.py'>
dirname = <function dirname at 0x7f479ae67820>
join = <function join at 0x7f479ae67550>
Also:
matthias@e6510:~/Projekte/ebook-reader-dict$ pyglossary --ui=gtk
[CRITICAL] Traceback (most recent call last):
File "/home/matthias/.local/bin/pyglossary", line 6, in <module>
main()
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/ui/main.py", line 575, in main
Glossary.init()
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/glossary.py", line 1153, in init
cls.loadPluginsFromJson(pluginsJsonPath)
File "/home/matthias/.local/lib/python3.8/site-packages/pyglossary/plugin_manager.py", line 53, in loadPluginsFromJson
with open(jsonPath) as _file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/matthias/.local/lib/python3.8/site-packages/plugins-meta/index.json'
Please pull and try again.
Please use pip install . -U
instead
That's better :+1:
$ python -m wikidict de --convert
>>> Loading data/de/data-20220120.json ...
>>> Loaded 133,008 words from data/de/data-20220120.json
>>> Generated dict-de-de.df (34,932,810 bytes)
>>> Generated dicthtml-de-de.zip (10,860,965 bytes)
No module named 'pyglossary.plugin_lib.py310'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/39280735.gif'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/39280735.gif'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/490fdc4a.gif'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/76993ec3.gif'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/76993ec3.gif'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/ba1f03ff.gif'
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/f6d31a88.gif'
>>> Generated dict-de-de.zip (10,590,910 bytes)
And ever better :muscle:
$ python -m wikidict fr --gen-dict='cercle unité' --output=test_wik --format=stardict
>>> Generated dict-fr-fr.df (4,595 bytes)
No module named 'pyglossary.plugin_lib.py310'
>>> Generated dict-fr-fr.zip (4,246 bytes)
error in DataEntry.save: [Errno 2] No such file or directory: '/home/tiger-222/.cache/pyglossary/dict-de-de.df_res/39280735.gif'
BTW @ilius is it expected that the GIF is not found?
BTW @ilius is it expected that the GIF is not found?
https://github.com/ilius/pyglossary/commit/11b2c3a2ede3a8efde6ce2d7ccc1a424b1ba3bec Please try again. Should not see that error again.
Is there a way to set the number of workers in --render
?
BTW @ilius is it expected that the GIF is not found?
ilius/pyglossary@11b2c3a Please try again. Should not see that error again.
Works perfectly, thanks!
Is there a way to set the number of workers in
--render
?
Not yet. I'm on it (cf #1199)!
Is there a way to set the number of workers in
--render
?
@ilius , you are good to go: --render --workers=N
:heavy_check_mark:
Did a quick one using fresh pulls: Fast, no errors on:
$ python3 -m wikidict fr --gen-dict='cercle unité' --output=test_wik --format=stardict
>>> Generated dict-fr-fr.df (4,595 bytes)
>>> Generated dict-fr-fr.zip (4,246 bytes)
Using iliusâ GoldenDict theme, we can see why raster images are bad, especially w/o transparancy:
Will be trying the --workers
now. Yesterday got a load of 14 (!) on a quad-core laptop (8 threads).
@ilius , you are good to go: --render --workers=N âïž
Thanks.
I'm still not sure how multiprocessing.Pool()
works.
For example when I pass --workers=2
, it results in Pool(processes=2)
correctly.
But I can see 10 processes (PIDs) (9 children), only 2 of them running at the same time (the rest are sleep).
Using workers=4 here, it jumps between 2 and 4 active. Most of the time, all 4 are active.
Itâs a memory hog, of courseâeating up all 8 GB RAM on my laptop, plus 1 GB of swap.
Possible that thereâs something wrong with that still?
I could previously (no workers) generate a complete dict, although it would use almost all resources on my laptop. Using workers=4
now, it produces full RAM (122 MB left of 8 GB, 0 bytes left on swap), and swaps itself to death (load average above 40!), had to pull the plug.
I see workers+1 python3 processes in top
, each reserving 2.4 GBÂ RAM. Trying workers=2 now.
Can we close this issue?
Note from @BoboTiG: issue tightly coupled to #1182, interesting details can be found there too.
I just downloaded, parsed and rendered the EN Wiktionary, and it apparently has some problems with erroneous and/or missing GIFs:
output.txt
All of the
.gif
files indata/en/res
appear to be very ugly rendered fomulae (?).