Rct567 / FrequencyMan

An Anki plugin to sort your new cards.
GNU General Public License v3.0
4 stars 0 forks source link

FrequencyMan (Anki Plugin)

Overview

FrequencyMan allows you to sort your new cards by word frequency, familiarity, and other useful factors.

Tested on Anki 2.1.60 (Qt6) and 23.12.1 (Qt6).

FrequencyMan

Features

Basic usage

  1. Open the "FrequencyMan" menu option in the "Tools" menu of the main Anki window.
  2. This will open FrequencyMan's main window where you can define your sorting targets.
  3. Define the targets using a JSON array of objects. Each object represents a target to sort (a target can be a deck or a defined selection of cards).
  4. Click the "Reorder Cards" button to apply the sorting.

Configuration examples

Example 1

Reorders a single deck. This will only match cards with note type Basic located in deck Spanish. It will also use the default ranking factors.

The content of the cards and all the ranking metrics will be analyzed per 'language'. The result of this will be combined to determine the final ranking of all new cards in the defined target.

[
    {
        "deck": "Spanish",
        "notes": [
            {
                "fields": {
                    "Front": "EN",
                    "Back": "ES"
                },
                "name": "Basic"
            }
        ]
    }
]

Example 2

Reorder the same deck twice, but the first target excludes the sorting of cards whose name matches "Speaking", while the second target only sorts those excluded cards.

The first target only modifies a single ranking factor, while the second target reduces the ranking factors used to only 2 factors.

Note: Both targets use the same 'main scope', which is the selection of cards used to create the data to calculate the ranking. This scope is reduced for each target by reorder_scope_query to limit which cards get repositioned.

[
    {
        "deck": "Spanish",
        "notes": [
            {
                "fields": {
                    "Meaning": "EN",
                    "Sentence": "ES"
                },
                "name": "Basic (customized note type)"
            }
        ],
        "reorder_scope_query": "-card:*Speaking*",
        "ranking_familiarity": 8
    },
    {
        "deck": "Spanish",
        "notes": [
            {
                "fields": {
                    "Meaning": "EN",
                    "Sentence": "ES"
                },
                "name": "Basic (customized note type)"
            }
        ],
        "reorder_scope_query": "card:*Speaking*",
        "ranking_factors": {
            "familiarity": 1,
            "word_frequency": 1
        }
    }
]

Example #3

Reorder only based on word frequency (using word frequency from both front and back):

[
    {
        "deck": "Spanish::Essential Spanish Vocabulary Top 5000",
        "notes": [
            {
                "name": "Basic-f4e28",
                "fields": {
                    "Front": "ES",
                    "Back": "EN"
                }
            }
        ],
        "ranking_factors": {
            "word_frequency": 1
        }
    }
]

Tokenizers

Custom tokenizers can be defined in user_files\tokenizers.

To use a custom tokenizer, or to see how one is defined, you can download here a working copy of Jieba (ZH), and here a version of Janome (JA).

If you download Janome (JA), you can place it in a directory like user_files\tokenizers\janome, which then should contain the file fm_init_janome.py and the subdirectory janome.

Automatic support

FrequencyMan will use tokenizers from other plugins, if there is no custom tokenizer for a given language:

Ranking factors

Default ranking factors

"ranking_factors" : {
    "word_frequency": 1.0,
    "familiarity": 1.0,
    "familiarity_sweetspot": 0.5,
    "lexical_underexposure": 0.25,
    "ideal_focus_word_count": 4.0,
    "ideal_word_count": 1.0,
    "reinforce_learning_words": 1.5,
    "most_obscure_word": 0.5,
    "lowest_fr_least_familiar_word": 0.25,
    "lowest_word_frequency": 1.0,
    "lowest_familiarity": 1.0,
    "new_words": 0.5,
    "no_new_words": 0.0,
    "ideal_new_word_count": 0.0,
    "proper_introduction": 0.1,
    "proper_introduction_dispersed": 0.0
}

Description

Custom fields

The following fields will be automatically populated when you reorder your cards:

Dynamic field names (the number at the end can be replaced with the index number of any field defined in the target):

For debug purposes:

Display focus words on the back of your cards (html example)

{{#fm_focus_words}}
  <p> <span style="opacity:0.65;">Focus:</span> {{fm_focus_words}} </p>
{{/fm_focus_words}}

Target settings

For each defined target, the following setting are available:

Setting Type Description Default value
deck string Name of a single deck as main scope. -
decks array of strings An array of deck names as main scope. -
scope_query string Search query as main scope. -
notes array of objects -
reorder_scope_query string Search query to reduce which cards get repositioned. Main scope as defined by deck, decks or scope_query.
ranking_factors object see 'Ranking factors'
familiarity_sweetspot_point string | float Defines a specific 'sweetspot' of familiarity for ranking factor familiarity_sweetspot. "~0.5" (=50% of focus_words_max_familiarity)
suspended_card_value float 0.1
suspended_leech_card_value float 0.0
ideal_word_count array with two int's [1, 5]
focus_words_max_familiarity float Defined the maximal familiarity value of focus words. Words above this threshold are considered 'mature'. 0.28
corpus_segmentation_strategy string Corpus data of a target is joined by language data id by default, but could also stay 'per note field' by setting it to "by_note_model_id_and_field_name". "by_lang_data_id"
id string Enables reorder logging for this target. None, reorder logging is disabled by default.

Notes:

Language data id

For each field a language_data_id must be defined. In most cases this should just be a two letter language code (ISO 639-1), such as EN or ES:

[
    {
        "deck": "Spanish::Essential Spanish Vocabulary Top 5000",
        "notes": [
            {
                "name": "Basic-f4e28",
                "fields": {
                    "Spanish": "ES",
                    "English": "EN"
                }
            }
        ]
    }
]

Alternatively, a language_data_id can also be an 'extended two letter language code':

[
    {
        "deck": "Medical",
        "notes": [
            {
                "name": "Basic-f4e28",
                "fields": {
                    "Front": "EN_MEDICAL",
                    "Back": "EN_MEDICAL"
                }
            },

        ]
    },
]

For every language data id defined, a directory should exist (although it could be empty). In the example above, \user_files\lang_data\en_medical should exist. If it does not exist, you will be prompted to automatically create one with a default word frequency file shipped with FrequencyMan.

Two different types of files can be placed in a language data id directory:

Reorder logging

Reorder logging is an optional feature that can be enabled by defining an id for a target. When enabled, it logs information about the content of that target each time the cards are reordered.

Display the amount of mature words

The information that is logged can be used to display the amount of 'mature' words a target has using the following settings:

"show_info_deck_browser": [
    {
        "lang": "ES",
        "target": "*"
    },
    {
        "lang": "EN",
        "target": "*"
    },
    {
        "lang": "ES",
        "target": "id_of_target"
    },
    {
        "lang": "EN",
        "target": "id_of_target"
    }
],
"show_info_toolbar": [
    {
        "lang": "ES",
        "target": "*"
    }
]

Notes:

Target Corpus data

A 'corpus data set' contains all the information related the the content of a note that is used to calculate the ranking of a card (such as the "familiarity" of a word).

Every target has one or more 'corpus data' sets, depending on how many fields are defined in the target and how the corpus_segmentation_strategy is set.

By default, corpus_segmentation_strategy is set to "by_lang_data_id", which means that a corpus data set will be created for every unique language_data_id:

{"Front": "EN", "Back": "EN"} // <- A single corpus data set
{"Front": "EN", "Back": "EN", "Extra": "ES"} // <- Two corpus data sets

To create separate corpus data sets for each field, you can set corpus_segmentation_strategy to "by_note_model_id_and_field_name". This will create a corpus data set for each field in the target:

{"Front": "EN", "Back": "EN"} // <- Two corpus data sets
{"Front": "EN", "Back": "EN", "Extra": "ES"} // <- Three corpus data sets

Things to note:

Word frequency lists

FrequencyMan comes with 50+ default word frequency lists. These are generated using one of the following sources:

The default word frequency lists can be found in the \default_wf_lists. When prompted to create a new language data directory with a default word frequency list, the relevant file will be copied to the new language data directory, such as \user_files\lang_data\en.

The user_files directory

The user_files directory can be found inside Frequencyman's plugin directory, which can be accessed via: Tools > Add-ons > (Select Frequencyman) > View Files.

Any files placed in this folder will be preserved when the add-on is upgraded. All other files in the add-on folder are removed on upgrade.

Manual installation from GitHub

  1. Go to the Anki plugin folder, such as C:\Users\%USERNAME%\AppData\Roaming\Anki2\addons21.
  2. Create a new folder with the name FrequencyMan.
  3. Make sure you are still in the directory addons21.
  4. Run: git clone https://github.com/Rct567/FrequencyMan.git FrequencyMan
  5. Start Anki.