FreeLanguageTools / vocabsieve

Simple sentence mining tool for language learning
GNU General Public License v3.0
394 stars 30 forks source link

Case-insensitive word search in audio library (German language) #147

Open voothi opened 8 months ago

voothi commented 8 months ago

Describe the bug Please consider changing the behavior of the word search engine for audio libraries. Relevant for the German language, in particular. In German, words are case sensitive. Now in versions 0.11.1 and 0.12.0, search in the "Word" field of the VocabSieve GUI is case sensitive. I don't think this is the right solution for searching audio databases. However, this is true for searching text dictionaries. At this moment, I noticed that the search for duplicates through AnkiConnect in the "Word" field is carried out in case-insensitive mode. I suggest setting this mode to search for entries from the "Word" field in audio libraries, in particular for the German language.

We are talking about making the search case-insensitive, but only for a certain type of connected dictionary. Only for audio dictionary (audio library). This is a problem with the German language. I haven't checked it in other cases.

For testing from the video, the audio library "de" from this source is used (see Telegram thread).

"iirc the lemmatizer pretty much always lowercases words minus proper nouns in other languages so it is less of an issue"

"Probably, but the Windows file system I'm currently running VocabSieve on is case insensitive as far as I know. At the same time, searching the audio library in my case is case sensitive in the German language learning mode."

"that's because it caches the filenames in a database"

Local library of Audio Library "de" in Windows Explorer photo_2024-03-26_09-17-02.jpg photo_2024-03-26_09-17-02 As you can see, all files are in lowercase.

To compare behavior with GoldenDict-NG on the same user environment. In GoldenDict-NG I have the same audio library "de" connected. Search in the audio library in this program is case-insensitive. Recording GoldenDict-NG test with German word "Sprechen" GoldenDict-NG / De-Ru Recording 2024-03-26 081557.mp4 Recording 2024 03 26 081557

Image GoldenDict-NG / Dictionaries / Sources / Sound Dirs / Path doc_2024-03-26_09-23-35.png

doc_2024-03-26_09-23-35

To Reproduce Steps to reproduce the behavior: Configure VocabSieve / General / Manage local resources.. image_2024-03-26_07-43-17.png

image_2024-03-26_07-43-17

Recording VocabSieve test with German word "Sprechen" Recording 2024-03-26 072007.mp4 Recording 2024 03 26 072007

Expected behavior See above Recording 2024-03-26 072007.mp4 Recording 2024-03-26 081557.mp4

Screenshots See above Recording 2024-03-26 072007.mp4 Recording 2024-03-26 081557.mp4 doc_2024-03-26_09-23-35.png image_2024-03-26_07-43-17.png

Logs

Desktop (please complete the following information):

Additional context Telegram thread: "Please consider changing the behavior of the word search engine for audio libraries."