BoboTiG / ebook-reader-dict

Finally decent dictionaries based on the Wiktionary for your beloved eBook reader. Daily updates & 13 languages supported so far.
http://www.tiger-222.fr/?d=2020/04/17/22/14/21-un-dictionnaire-alternatif-et-complet-pour-votre-liseuse
MIT License
408 stars 24 forks source link

Rendering errors (<chem> and <math>) #1182

Closed Moonbase59 closed 2 years ago

Moonbase59 commented 2 years ago

Note from @BoboTiG: issue tightly coupled to #1183, interesting details can be found there too.


I did a fresh download and render of the EN wiktionary today, and got the following errors:

>>> Loading data/en/data_wikicode-20220120.json ...
>>> Loaded 1,038,672 words from data/en/data_wikicode-20220120.json
<chem> ERROR with ^-N=\overset{+}N=N^- in [azide]
<math> ERROR with \begin{align}\frac{\pi}{2} & = \prod_{n=1}^{\infty} \frac{ 4n^2 }{ 4n^2 - 1 } = \prod_{n=1}^{\infty} \left(\frac{2n}{2n-1} \cdot \frac{2n}{2n+1}\right) \\[6pt]& = \Big(\frac{2}{1} \cdot \frac{2}{3}\Big) \cdot \Big(\frac{4}{3} \cdot \frac{4}{5}\Big) \cdot \Big(\frac{6}{5} \cdot \frac{6}{7}\Big) \cdot \Big(\frac{8}{7} \cdot \frac{8}{9}\Big) \cdot \; \cdots \\\end{align} in [Wallis product]
<math> ERROR with \begin{align}a_0 &+ a_1x + a_2x^2 + a_3x^3 + \cdots + a_nx^n \\ &= a_0 + x \bigg(a_1 + x \Big(a_2 + x \big(a_3 + \cdots + x(a_{n-1} + x \, a_n) \cdots \big) \Big) \bigg).\end{align} in [Horner's rule]
<math> ERROR with \frac = \frac in [circle of Apollonius]
<math> ERROR with \begin{align}\rho(g, h) (0,x_1,\ldots,x_k) &= g(x_1,\ldots,x_k) \\\rho(g, h) (y+1,x_1,\ldots,x_k) &= h(y,\rho(g, h) (y,x_1,\ldots,x_k),x_1,\ldots,x_k)\,\end{align} in [primitive recursion]
>>> Saved 697,169 words into data/en/data-20220120.json
>>> Render done!
BoboTiG commented 2 years ago

We are aware of such issues. Most of maths and chem scripts can be converted to GIF though. But some are not passing our LaTeX parser.

Any help is welcome, I bet Wikimedia is using specific modules for that.

You can find more info on #1096.

Moonbase59 commented 2 years ago

Oh well, I was expecting such problems.. Formulae are always a problem. Unfortunately, readers don’t usually use MathJax or MathML (although that should work in EPUB3).

So transformation is always a big issue, especially since devices have such differing display ppi, making small images often absolutely illegible. Are you actually using (La)TeX to produce the GIFs?

Their quality is not too good, I wonder if we could eventually switch to 8-bit transparent PNGs instead, to get a little better output. Anyone knows how good that is supported on readers?

(Just tried a few, scrapping all metadata inside, and saving as 8-bit grayscale+alpha PNG, they aren’t that much bigger. Example: 194 bytes → 209 bytes.)

BoboTiG commented 2 years ago

Actually I think Kobo does only support GIFs. I need to check again though.

lasconic commented 2 years ago

dictgen mentions GIF and JPG https://pgaskin.net/dictutil/dictgen/ In theory, Kobo should support PNG https://help.kobo.com/hc/fr/articles/360017763713-Formats-de-fichiers-pris-en-charge-par-votre-application-Kobo-eReader-et-Kobo-Books but not sure if the support is included for dictionaries.

lasconic commented 2 years ago

"azide" is new but the others math expression errors are known: https://github.com/BoboTiG/ebook-reader-dict/issues/1096

A way to debug, add "-d -1" and print the exception

except Exception as e:
        print(e)
lasconic commented 2 years ago

I just tested and PNGs works. Not sure exactly what sort of PNG it is... I just changed https://github.com/BoboTiG/ebook-reader-dict/blob/794a7236d46fd91f57cd52c8fe428c635f695ae1/wikidict/utils.py#L498 and https://github.com/BoboTiG/ebook-reader-dict/blob/794a7236d46fd91f57cd52c8fe428c635f695ae1/wikidict/utils.py#L503

and replaced "gif" by "png".

Then created a en dict with only "graph" as a word:

mkdir  test_wik
python -m wikidict en --gen-dict=graph --output=test_wik 

Resulting dictionary in kobo format: dicthtml-en-en.zip (3,939 bytes)

With gif: dicthtml-en-en.zip (4,047 bytes)

Tested on Kobo Aura with latest firmware 4.31.19086

Moonbase59 commented 2 years ago

Sounds great, thanks for testing. Must get a Kobo soon… What imaging lib does it use? Maybe we can find out more (like how to specify 8-bit greyscale, alpha, no metadata) to keep them small.

I wonder if we could even make it use an SVG. That would be the best (scalable). Some experimenting to do here, I guess.

lasconic commented 2 years ago

I was also curious if svg could be used on Kobo. And it can ! dicthtml-en-en.zip (11,819 bytes)

dvioptions = [ "-d -1", ]
    with BytesIO() as buf, BytesIO() as im:
        preview(
            f"${expr}$",
            output="svg",
            viewer="BytesIO",
            outputbuffer=buf,
            dvioptions=dvioptions,
            packages=tuple(packages),
        )

        buf.seek(0)
        raw = buf.read()

    return f'<img style="{IMG_CSS}" src="data:image/svg+xml;base64,{b64encode(raw).decode()}"/>'

PNG: screen_001

GIF screen_003

SVG screen_005

BoboTiG commented 2 years ago

Ooooohhhh I am in love with SVG! Why did not we try sooner? :D

BoboTiG commented 2 years ago

If going the SVG way, we need also to check what is the output when PyGlossary handles the word, and how it looks finally (cc @Moonbase59). Could you share the StarDict file @lasconic?

BoboTiG commented 2 years ago

Looking again at examples, GIF & PNG seem so archaic now :o

lasconic commented 2 years ago

unfortunately pyglossary is not happy:

Traceback (most recent call last):
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/glossary.py", line 905, in _read
    reader.open(filename)
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/plugins/ebook_kobo_dictfile.py", line 71, in open
    TextGlossaryReader.open(self, filename)
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/text_reader.py", line 84, in open
    self._open(filename)
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/text_reader.py", line 80, in _open
    self.loadInfo()
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/text_reader.py", line 131, in loadInfo
    self._pendingEntries.append(self.newEntry(word, defi))
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/text_reader.py", line 113, in newEntry
    return self._glos.newEntry(
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/glossary.py", line 742, in newEntry
    return Entry(
  File "ebook-reader-dict/isoEnv/lib/python3.9/site-packages/pyglossary/entry.py", line 285, in __init__
    raise TypeError(f"invalid defi type {type(defi)}")
TypeError: invalid defi type <class 'tuple'>
Reading file 'test_wik/dict-en-en.df' failed.
BoboTiG commented 2 years ago

Actually the error is present on the main branch too, meaning it is not SVG-related. (Let's add a simple test to cover the use case ;)

Moonbase59 commented 2 years ago

WOW! Thanks so much for trying and the screenshot comparisons. We should generate something and post it to MobileRead and/or E-Reader Forum maybe, to get some other actual users try it.

The SVG looks so much better, and hopefully on any device…

Moonbase59 commented 2 years ago

check what is the output when PyGlossary handles the word

Probably need to talk to @ilius to support writing lots of small SVGs instead ;-) Plus, of course, not destroy anything that might be in a real dictionary, like JPG/PNG images (as are in real dicts: Cambridge has lots of JPGs, German Duden even has PDFs). We just might—in the ffar future—wish to include images from Wiktionary, after all…

ilius commented 2 years ago

Converting a bitmap format (like png or gif) to SVG (salable vector graphics) is not on the cards, really. I'm not sure how to explain why.

Moonbase59 commented 2 years ago

Depends … where do you convert from? The svg would be there already in the dict, base64-encoded. Could that not just be taken and written out? See https://github.com/BoboTiG/ebook-reader-dict/issues/1182#issuecomment-1027245425

Of course trying to convert a raster image to SVG makes no sense.

lasconic commented 2 years ago

it should work already https://github.com/ilius/pyglossary/blob/e864fa4cd29bcba024dc10e6b93eda259c228449/pyglossary/image_utils.py#L13

BoboTiG commented 2 years ago

Let's move the conversation to #1183. It is starting to be hard to follow :)