gfngfn / SATySFi

A statically-typed, functional typesetting system
GNU Lesser General Public License v3.0
1.16k stars 82 forks source link

Proposal: I18n: Support non-English hyphenation dictionaries #232

Open na4zagin3 opened 4 years ago

na4zagin3 commented 4 years ago

This proposal is to add support of hyphenation of non-English languages. This is the first step of supporting internationalization.

Proposal

load-hyphen-pattern language loads a hyphenation dictionary from hyph/<language>.satysfi-hyph. It raises an exception when the file is not found.

set-hyphen-pattern hyph ctx sets hyphnation pattern hyph to ctx.hyphenation_pattern.

get-hyphen-pattern ctx returns hyphnation pattern ctx.hyphenation_pattern.

Current Implementation

Alternative Options

Activate multiple hyphen-dicts at the same time

This proposal based on a design where users can replace English hyphenation pattern with other language's. It may be natural to set a hyphenation dictionary to each language/script (i.e., set-hyphen-dict : language-tag -> hyphen-dict > ctx -> ctx or set-hyphen-dict : hyphen-dict language-tag-map -> ctx -> ctx) rather than applying given hyphenation pattern globally, if we decide to extend the multi-language system, where English and Japanese are automatically detected with script types.

Introducing new type hyphen-dict

Instead of introducing hyphen-dict and having users explicitly handle hyphenation dictionaries, we could provide primitives get/set strings that represent languages (e.g., set-hyphen-dict : string -> ctx -> ctx).

However, hyphen-dict type allows more extension points (e.g., tweaking hyphenation patterns, adding exceptional words ad hoc) in future.

load-hyphen-dict throwing exceptions

load-hyphen-dict can have signature load-hyphen-dict : string -> hyphen-dict option. I don't have strong opinion about this. I was thinking of having a new package for each language, therefore specifying wrong filenames is unlikely.

Having a primitive to get available hyphenation dictionary files

I could include another primitive get-hyph-dict-list that returns available files under hyph/ (for example, returning [ "en" ]). This primitive is not mandatory.

Renaming english.satysfi-hyph for en.satysfi-hyph

We could leave the filename as is. However, considering even TeX has already adopted naming scheme with BCP 47 Language Tag, there is no reason to stick at traditional naming scheme with language names in English.

na4zagin3 commented 3 years ago

May I consider this proposal approved? If so, I’ll work on this after the refactoring is done.