aaronhktan / jyut-dict

A free, open-source, offline Cantonese Dictionary for Windows, Mac, and Linux. Qt, SQLite. C++ and Python.
https://jyutdictionary.com
MIT License
122 stars 8 forks source link

Additional romanization schemes #44

Closed LawranceFung closed 2 years ago

LawranceFung commented 2 years ago

Would you consider adding additional romanization schemes, or considering a pull request for the feature?

aaronhktan commented 2 years ago

Yes! It's something I've been looking at (see issue #33 - the discussion is in French but it should be easy to machine translate for understanding).

Some thoughts about it, which I'll write down here as a note:

  1. The preferences file currently has four options (prefer Jyutping, prefer Pinyin, only Jyutping, only Pinyin) — the enum is defined here. Adding more romanization schemes would entail completely rethinking the preference options. My thoughts: split the option up into five different groups of options:
    • [Show Cantonese romanization - boolean yes/no]
    • [Show Mandarin romanization - boolean yes/no]
    • [Prioritize Cantonese romanization over Mandarin - boolean yes/no]
    • [Select Cantonese romanizations to show - list of options]
    • [Select Mandarin romanizations to show - list of options]
  2. Changing the preference means that we'd also need to migrate old preferences to new preferences. (e.g. old "Prefer Jyutping" -> new "Show Mandarin yes/Show Cantonese yes/Prioritize Cantonese yes/Mandarin romanizations [Pinyin]/Cantonese romanizations [Jyutping]")
  3. There are a bunch of places where these preferences are referenced, and would have to take into account the new enum(s):
    • ResultListDelegate, where the search results are displayed
    • EntryHeaderWidget, where the Jyutping/Pinyin are currently displayed in the detail view
    • SentenceContentWidget, where Jyutping/Pinyin are displayed for example sentences
    • Entry, the class that implements the data structure for all entries in the dictionary.
  4. For specific romanization schemes, it would be nice to have:
    • Cantonese: Yale (second-most-used), Cantonese Pinyin (promoted by HK government), IPA (for linguists). Conversions from Jyutping to these romanization schemes is implemented in a Wiktionary module as well as in PyCantonese; an implementation for Jyut Dictionary could probably be inspired by approaches taken by those scripts.
    • Mandarin: Zhuyin (used in Taiwan), IPA (for linguists). I will admit that I am less familiar with what's in common use for Mandarin romanization (does anybody still use Tongyong Pinyin?), but I believe Pinyin+Zhuyin+IPA should cover most bases. See Wiktionary module and Dragonmapper for some prior art.
  5. As discussed in #33, it would be nice to have a warning next to the non-Jyutping/non-Pinyin entries to warn that they are auto-generated and not necessarily accurate.

It would be a non-trivial amount of work. (Which is probably why I've been putting it off for a while, in addition to there being a global pandemic and all.) If somebody is willing to make a PR, I'd be glad to take a look!

LawranceFung commented 2 years ago

I don't speak French. Sorry, I'm not Canadian! I'll Google or Bing translate it for context.
I think all the Romanization schemes you mentioned have value and possibly a place in the app. I'll admit I'm interested in including one I devised myself - http://www.cantonese.sheik.co.uk/phorum/read.php?1,150054 - meant to preserve historical distinctions and compact digraphs for the sake of quick typing and accurate character selection.

aaronhktan commented 2 years ago

Hmm, I thought I left a comment here but it doesn't show up. Anyway, that looks cool! I would prioritize Yale / IPA since they are more widespread for the moment – but if I can get the system going, I'll definitely look into your romanization system.

Also, for housekeeping reasons, I'm going to close this as a dupe of #33, just to keep the issues page cleaner. Let me try the keyword:

Duplicate of #33