StephanAkkerman / FluentAI

Automating language learning with the power of Artificial Intelligence. This repository presents FluentAI, a tool that combines Fluent Forever techniques with AI-driven automation. It streamlines the process of creating Anki flashcards, making language acquisition faster and more efficient.
https://akkerman.ai/FluentAI/
MIT License
9 stars 1 forks source link

Filter non-sense words from ipa dataset #49

Open StephanAkkerman opened 3 weeks ago

StephanAkkerman commented 3 weeks ago
  1. Description:

    • Problem: There are a lot of words that seem to have little to no meaning that will show up in word2mnemonic.

    • Solution: Find a method to filter words that are common / used and only consider those for phonetic sim.

    • Prerequisites: [List any requirements or dependencies needed before starting.]

  2. Tasks:

    • Look for a filter for words that we want to keep (commonly used, but also colloquial). The filter may include curse words or slang (because that is easily remembered)
    • Apply the filter over the dataset and save the new dataset for ipa
  3. Additional context For instance for the chinese word for cat it comes up with words like mauer and other non common words. I would rather see something like miao on top.

StephanAkkerman commented 2 weeks ago

We could also add names, common English names, places, etc.