Ajatt-Tools / anki.koplugin

KOReader plugin enabling Anki card generations for words looked up in the internal dictionary.
56 stars 6 forks source link

Add kana reading to created note #26

Closed oldmerkum closed 4 months ago

oldmerkum commented 4 months ago

Plugin appears to work well for mining vocab from my old kindle. However, I notice that only the kanji, english definition and sentence context are extracted. I'd like to also get the kana reading of the word added to the note. (Sorry, this is more like request than an issue)

nairyosangha commented 4 months ago

There's already some functionality for that built in.

For example, you could make an extension like this:

local KanaFieldPopulator = {
    kana_field = "<NAME OF FIELD ON CARD GOES HERE>"
}

function KanaFieldPopulator:run(note)
    if not self.popup_dict.is_extended then
        self.popup_dict.results = require("langsupport/ja/dictwrapper").extend_dictionaries(self.popup_dict.results, self.conf)
        self.popup_dict.is_extended = true
    end

    local selected_dict = self.popup_dict.results[self.popup_dict.dict_index]
    local kana = selected_dict:get_kana_words():get()
    note.fields[self.kana_field] = table.concat(kana, ", ")
    return note
end

return KanaFieldPopulator

Save this as EXT_kana_populator.lua or something like that in the 'extensions' folder

However, this can't magically assume which part of your dictionary definition actually contains the kana that it needs to extract of course, it uses patterns for that, defined here https://github.com/Ajatt-Tools/anki.koplugin/blob/9ff184e3ad68afce11e56b61ee716b4c6d8adfbe/langsupport/ja/dictwrapper.lua#L14-L20

with the following as fallback https://github.com/Ajatt-Tools/anki.koplugin/blob/9ff184e3ad68afce11e56b61ee716b4c6d8adfbe/langsupport/ja/dictwrapper.lua#L8

This assumes your dictionary has headwords like this: しょうにん【商人】 You may have to update the list if the default pattern doesn't work for you, the dictionary name you're supposed to use is stored in the dictionary's .ifo file

oldmerkum commented 4 months ago

I was able to get the extension added. I initially forgot to to edit kana_field = "<NAME OF FIELD ON CARD GOES HERE>" causing koreader to crash when I attempted to add a word to anki. Had to delete the saved json where cards are locally stored on my kindle to get the anki plugin to load properly again after editing that since it would silently fail on koreader startup.

Now when I add a card to anki, the reading still doesn't get added.

I do see this being added to the Glossary field though. So the pattern seems correct for "(.*)【.*】"

<div class="definition">
  <ol>
    <li dict="JMdict-ja-en">かよう【通う】
    <br>〘v5u・vi〙
    <br>1 to go to and from (a place); to go back and forth between; to run between (e.g. bus, train, etc.); to ply between.
    <br>2 to go to (school, work, etc.); to attend; to commute; to frequent.
    <br>3 to circulate (e.g. blood, electricity); to be communicated (e.g. thought).
    <br>→血が通う1・心が通う
    <br>4 to resemble.
    </li>
  </ol>
</div>

Does this need edited perhaps? ["JMdict Rev. 1.9"] Maybe not since it correctly is adding kanji and definition...

This is the contents of koreader/data/dict/JMdict Japanese-English dictionary/JMdict-ja-en/JMdict-ja-en.ifo

StarDict's dict ifo file
version=3.0.0
bookname=JMdict-ja-en
wordcount=188380
idxfilesize=8347959
synwordcount=484272

lang=ja-en
nairyosangha commented 4 months ago

since it would silently fail on koreader startup.

I'll do something about this one of these days.. problem is that it loads these notes the moment a user opens a document, and it'd be kinda weird to immediately show them a popup even if they didn't interact with the plugin yet at all

I do see this being added to the Glossary field though. So the pattern seems correct for "(.)【.】"

The pattern indeed looks correct, the problem is that it runs this pattern on the headword, NOT the definition itself. For JMdict the headword is probably just 通う so it can't match anything on that. You'll have to add it to the list after all:

 kana_pattern = { 
     -- key: dictionary name as displayed in KOreader (received from dictionary's .ifo file) 
     -- value: a table containing 2 entries: 
     -- 1) the dictionary field to look for the kana reading in (either 'word' or 'description') 
     -- 2) a pattern which should return the kana reading(s) (the pattern will be looked for multiple times!) 
     ["JMdict Rev. 1.9"] = {"definition", "<font color=\"green\">(.-)</font>"}, 
     ["JMdict-ja-en"] = {"definition",  "(.*)【.*】"},
 }, 

I believe something like that should work

nairyosangha commented 4 months ago

Also, maybe a bit late to mention this now, but there are anki addons you can use that do this as well, and they are quite a bit smarter than what you'd get with this extension on the koreader plugin side https://ankiweb.net/shared/info/1344485230

oldmerkum commented 4 months ago

Messed around with this a while. Think this got it working in config.lua enabled_extensions = { Need to explicitly enable extensions it seems.

oldmerkum commented 4 months ago

Thanks for suggestion on using an anki addon instead btw. I keep discovering that centralizing a lot of things into anki is simpler (had a time trying to get frequency data in my lookups, then found anki has an addon for that as well)

nairyosangha commented 4 months ago

Messed around with this a while. Think this got it working in config.lua enabled_extensions = { Need to explicitly enable extensions it seems.

Indeed, I wouldn't wanna run all those automatically for users that don't care about it.

I take it this is done then? Feel free to reopen if you disagree