Genthus / Freqman

Anki addon to sort cards according to a frequency dictionary
MIT License
5 stars 0 forks source link

Possibility of other dictionaries? #4

Open ghost opened 1 year ago

ghost commented 1 year ago

Hey! Can or could I format other files to be sorted like these ones from Yomichan? I study many languages, including Chinese and these don't work in your program. Is there any guide so I could make my own? Thanks

Genthus commented 1 year ago

Hi! Thank you for checking out the project. Could you link me to one of these dictionaries so I can try to figure it out?

Currently the addon only supports json frequency dictionaries with a schema like those for yomichan, however we could add more types once we figure out how the information in structured

ghost commented 1 year ago

Found these googling and they all seem to have a different structure. The Monolingual ones are the best ones [ZH-ZH] https://drive.google.com/drive/folders/1-WReUZvneHEkvnjeqJA4oT08od7eK-sq

Genthus commented 1 year ago

I'm checking out [ZH-ZH] 萌典国语辞典 (简体字). I don't know chinese so you'll have to tell me which is the frequency if there is one here

    [
        "扒",
        "(一)bā",
        "",
        "",
        0,
        [
            "【扒】 [9960] [动]\n1.剥开。如:「把橘子扒开来吃。」\n2.扯掉。如:「扒衣裳」。\n3.挖开、刨开。如:「扒土」、「扒堤」。\n4.用手攀住东西。如:「扒着栏杆」、「你扒住,不然会掉下去。」"
        ],
        2,
        ""
    ],
    [
        "叭",
        "bā",
        "",
        "",
        0,
        [
            "【叭】 [107k+] [名]\n参见「喇叭」条。\n[状]\n形容汽车、机车的喇叭声。如:「忽听叭的一声,吓了我一跳!」或叠字作「叭叭」。"
        ],
        3,
        ""
    ],
ghost commented 1 year ago

The numbers in brackets seem to be the frequency [9960] [107k+]. Don't see any other indicator :/

Genthus commented 1 year ago

It seems like its occurence based instead of by rank, but the addon supports that so that's not the main problem. The way I see it, supporting this dictionary would be too specific, I think it would be better make a new dictionary by extracting the terms and frequencies here and using an existing yomichan frequency dict schema. I don't think making this new dictionary would be too much of a problem but let me know if you think this solution would work or if there's a better dictionary to be using

ghost commented 1 year ago

Oh, that makes more sense yes... This seems to be the best dictionaries for chinese one could use, as far as I've searched so yeah, I think this would be the best solution.

Genthus commented 1 year ago

I made a script to create the frequency dictionary if you want to use it https://github.com/Genthus/yomichan-freq-dictionary-maker

I tried importing it into the addon and it seemed to work fine, but let me know if you have any issues when sorting

ghost commented 1 year ago

Working!! Thanks so much :)

This error pops up after about 3-4 minutes of processing:

Anki 2.1.56 (07fd88dd) Python 3.10.9 Qt 6.4.2 PyQt 6.4.0
Platform: Linux-5.15.87-1-lts-x86_64-with-glibc2.36
Flags: frz=False ao=True sv=3
Add-ons, last update check: 2023-01-15 13:10:20
Add-ons possibly involved: ⁨Freqman⁩

Caught exception:
Traceback (most recent call last):
  File "/home/yaoberh/.local/share/Anki2/addons21/1502429998/progressWindow.py", line 19, in run
    recalculate()
  File "/home/yaoberh/.local/share/Anki2/addons21/1502429998/ordering.py", line 156, in recalculate
    cleanSorted()
  File "/home/yaoberh/.local/share/Anki2/addons21/1502429998/ordering.py", line 88, in orderCardsInDB
    highest = getHighestFreqVal() + 1
TypeError: can only concatenate str (not "int") to str

Maybe more languages could be implemented, if the source of them is the same and the formatting is also. For example just picking the frequency lists from https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists And then assigning priority 1 to the first word and priority 200 to the 200th word and so forth. I've been studying languages like this together with morphman until now, but this project of yours is much easier to manage :D

Genthus commented 1 year ago

I had that same issue a little while ago and pushed an update, can you check if there are any updates available, and try deleting and reimporting the dictionary as well please, I messed up and it threw an error on a few words.

As for the additional languages, I'll check out the lists and make a plan, thank you for your support! If you are using the addon for multiple languages, I imagine it would be much easier to be able to select a dictionary for each card-type you have, would this be something you see as a useful feature?

ghost commented 1 year ago

Happy to help! I've made profiles for each language so as to not mix the notes into each other, but I see how this would be a very cool addition.

ghost commented 1 year ago

Now getting this error, which looks like it's got something to do with my cards right?

Caught exception:
Traceback (most recent call last):
  File "/home/yaoberh/.local/share/Anki2/addons21/Freqman/progressWindow.py", line 19, in run
    recalculate()
  File "/home/yaoberh/.local/share/Anki2/addons21/Freqman/ordering.py", line 161, in recalculate
    orderCardsInDB()
  File "/home/yaoberh/.local/share/Anki2/addons21/Freqman/ordering.py", line 88, in orderCardsInDB
    highest = getHighestFreqVal() + 1
  File "/home/yaoberh/.local/share/Anki2/addons21/Freqman/db.py", line 281, in getHighestFreqVal
    return int(res.fetchone()[0])
ValueError: invalid literal for int() with base 10: '?'
Genthus commented 1 year ago

I pushed a new version, can you update, remove the dictionary and add it again?