BoboTiG / ebook-reader-dict

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.
http://www.tiger-222.fr/?d=2020/04/17/22/14/21-un-dictionnaire-alternatif-et-complet-pour-votre-liseuse
MIT License
391 stars 21 forks source link

[FR] Parse error #1197

Closed Moonbase59 closed 2 years ago

Moonbase59 commented 2 years ago

Just downloaded the FR dump (as of 2022-01-20) and trying to parse it:

python3 -m wikidict fr --parse

Output:

>>> Processing data/fr/pages-20220120.xml ...
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/__main__.py", line 122, in <module>
    sys.exit(main())
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/__main__.py", line 60, in main
    return parse.main(args["LOCALE"])
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/parse.py", line 103, in main
    words = process(file, locale)
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/parse.py", line 69, in process
    for element in xml_iter_parse(file):
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/parse.py", line 22, in xml_iter_parse
    for event, element in doc:
  File "/usr/lib/python3.8/xml/etree/ElementTree.py", line 1233, in iterator
    root = pullparser._close_and_return_root()
  File "/usr/lib/python3.8/xml/etree/ElementTree.py", line 1280, in _close_and_return_root
    root = self._parser.close()
xml.etree.ElementTree.ParseError: unclosed token: line 125099077, column 6
BoboTiG commented 2 years ago

It seems the previous step failed. The downloaded XML is incomplete/truncated?

Moonbase59 commented 2 years ago

Hm. Should I re-download? Shows 4.6 GB here.

Moonbase59 commented 2 years ago

Perfect, you’re correct. Wonder why it truncated, now shows 5 GB! Thanks!

Moonbase59 commented 2 years ago

Hm. Now it will still not finish, trying again tomorrow…

matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --parse
>>> Processing data/fr/pages-20220120.xml ...
>>> Saved 1,882,269 words into data/fr/data_wikicode-20220120.json
>>> Parse done!
matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --render
>>> Loading data/fr/data_wikicode-20220120.json ...
>>> Loaded 1,882,269 words from data/fr/data_wikicode-20220120.json
 !! Missing 'ar-cf' template support for word 'Djamel'
 !! Missing 'ar-cf' template support for word 'azulejo'
 !! Missing 'ar-cf' template support for word 'Ali'
 !! Missing 'ar-cf' template support for word 'alcade'
 !! Missing 'ar-cf' template support for word 'Mourad'
 !! Missing 'ar-cf' template support for word 'cadi'
 !! Missing 'ar-cf' template support for word 'Zahra'
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/render.py", line 397, in render_word
    words[word] = details
  File "<string>", line 2, in __setitem__
  File "/usr/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod
    kind, result = conn.recv()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/__main__.py", line 122, in <module>
    sys.exit(main())
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/__main__.py", line 65, in main
    return render.main(args["LOCALE"])
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/render.py", line 444, in main
    words = render(in_words, locale)
  File "/home/matthias/Projekte/ebook-reader-dict/wikidict/render.py", line 414, in render
    pool.map(partial(render_word, words=results, locale=locale), in_words.items())
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
EOFError
BoboTiG commented 2 years ago

First time I see that error :thinking:

If you run again the command, it still fails?

Moonbase59 commented 2 years ago

Yup, it did yesterday. Now trying with fresh pulls of the project and pyglossary., and workers=4.

No success. Gets me some broken pipes:

matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --render --workers=4
>>> Loading data/fr/data_wikicode-20220120.json ...
>>> Loaded 1,882,269 words from data/fr/data_wikicode-20220120.json
 !! Missing 'ar-cf' template support for word 'azulejo'
 !! Missing 'ar-cf' template support for word 'Ali'
 !! Missing 'ar-cf' template support for word 'alcade'
 !! Missing 'ar-cf' template support for word 'Mourad'
Process ForkPoolWorker-4:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 405, in _send_bytes
    self._send(buf)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 136, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Getötet
matthias@e6510:~/Projekte/ebook-reader-dict$ Process ForkPoolWorker-3:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 136, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-5:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 136, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 136, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

matthias@e6510:~/Projekte/ebook-reader-dict$ 
Moonbase59 commented 2 years ago

Phew, success. Needed to close the browser and all other apps (GoldenDict, MathPix, Shutter) and succeeded just barely with workers=2.

matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --render --workers=2
>>> Loading data/fr/data_wikicode-20220120.json ...
>>> Loaded 1,882,269 words from data/fr/data_wikicode-20220120.json
 !! Missing 'ar-cf' template support for word 'Ali'
 !! Missing 'ar-cf' template support for word 'alcade'
>>> Saved 1,794,376 words into data/fr/data-20220120.json
>>> Render done!
matthias@e6510:~/Projekte/ebook-reader-dict$ python3 -m wikidict fr --convert
>>> Loading data/fr/data-20220120.json ...
>>> Loaded 1,794,376 words from data/fr/data-20220120.json
>>> Generated dict-fr-fr.df (122,195,089 bytes)
>>> Generated dicthtml-fr-fr.zip (35,634,409 bytes)
>>> Generated dict-fr-fr.df.bz2 (21,047,994 bytes)
>>> Generated dict-fr-fr.zip (31,322,906 bytes)
matthias@e6510:~/Projekte/ebook-reader-dict$