FreeLanguageTools / vocabsieve

Simple sentence mining tool for language learning
GNU General Public License v3.0
387 stars 30 forks source link

KOReader vocab: filtering by language potentially too aggressive #167

Open artjomsR opened 3 months ago

artjomsR commented 3 months ago

Is your feature request related to a problem? Please describe. Recently, I couldn't get VocabSieve to import volcab from 1 of my books. After reading the logs and checking metadata, I've discovered that that book was mistakenly tagged with English language, despite the actual text being in Portuguese. I tried to change it in Calibre and override the value in KOReader, but it still wouldn't be picked up, so I can see this being a pain point for other users, should the same happen. I understand the filtering is beneficial to users that learn multiple languages at the same time. But for those who learn only 1, it can filter out false positives.

For the time being, I've commented out if book[0].startswith(langcode) in https://github.com/FreeLanguageTools/vocabsieve/blob/master/vocabsieve/importer/KoreaderVocabImporter.py#L49

Describe the solution you'd like Rather than including only highlights from user's target language, have 2 heading for Select books to extract highlights from:

  1. Highlights from target language - basically what is currently displayed
  2. Highlights from other languages - the inverse of the above

Additional context Add any other context or screenshots about the feature request here.

1over137 commented 3 months ago

For KOReader we are reading from the metadata.lua file, not the ebook file. In fact the ebook file is not really opened at all.

https://docs.freelanguagetools.org/importers/KOReader.html#cant-find-my-books

The trouble is that the lookup history import actually happens before the actual card creation. It is presumably common for people to also be reading in other languages on the same device, which would add a lot of noise to the lookup history if not filtered. Maybe a better UI would help here.