Open jzohrab opened 1 year ago
See jzohrab/lute-v3#5 for initial notes.
The kobo dictionaries at https://www.epubor.com/kobo-dictionary-download-and-install.html are good starts, but you need to change the http links to https.
When the dict is downloaded, if you decompress the zip, it contains a bunch of files, e.g. co.html
, but these are in fact compressed data. You can decompress them, eg
cp ca.html hack_co.html
gzip -S .html -d hack_co.html
mv hack_co hack_co.data
and this results in a file called hack_co.data
with data like the following:
<w><p><a name="correr"/><b>correr</b> [koˈreɾ]<br/><br/>
<p>Del latín <i>currere</i></p><br/><ol><li>Desplazarse rápidamente ....</li>
...
<variant name="corra"/>
<variant name="corre"/>
...
So, these files could be pre-processed to have all (??) variants of a word, and the word itself, being an initial index into the data files, and a Lute-Kobo lookup could look like this: Given input word fui
, pre-processed file initial_index_fu.data
contains something like fui: ir
(fui
being one of the variants
of ir
, we hope!), and then the actual lookup is done using ir
to get the definition.
I don't know how this would/should work for ambiguous mappings. Perhaps something like gato: gato; gatar
(if there is a word like gatar
).
if nothing is found, just return 'not found'.
The pre-processing could be done outside of Lute, or as a heavyweight initial load. Outside is better, I think: less crap to go wrong in the app, separate concern.
This is a good idea, simple offline-style dict.