davidpomerenke / alphabetify

Learn a new alphabet by reading a good text in your native alphabet with more and more foreign letters.
https://alphabetify.js.org
GNU General Public License v3.0
2 stars 1 forks source link
alphabet ancient-greek cyrillic greek greek-alphabet hiragana japanese katakana language language-learning russian transliteration

Alphabetify

NPM version Node CI codecov

Alphabetify makes learning new alphabets easy. Copypaste any text – the breaking news, your favourite book, or a piece of homework. Alphabetify will transform the text for you. It starts with all symbols in your native alphabet, then slowly introduces more and more foreign characters. You will get used to them and learn the alphabet without any effort. Try it here!

Example

import { alphabetify } from 'alphabetify'

const text =
'Tell me, O muse, of that ingenious hero who travelled far and wide after he had sacked the famous town of Troy. Many cities did he visit, and many were the nations with whose manners and customs he was acquainted; moreover he suffered much by sea while trying to save his own life and bring his men safely home; but do what he might he could not save his men, for they perished through their own sheer folly in eating the cattle of the Sun-god Hyperion; so the god prevented them from ever reaching home. Tell me, too, about all these things, O daughter of Jove, from whatsoever source you may know them.'

alphabetify(text, 'grek-grc', 'en')
  .then(result => console.log(result))

// Tell me, O muse, of thαt ingenious hero who trαvelled fαr ἀnd wiδe ἀfter he ἁδ sαckεδ thε fαmous town of Troy. Mαny citiεs δiδ ἑ visit, ἀnδ mαny wεrε θε nαtions wιθ whosε mαnnεrs ἀnδ κustoms ἑ wαs ἀκquαιntεδ; morεovεr ἑ suffεrεδ muκh βι sεα whιλε trιιγγ to sαvε ἱs owν λιfε ἀνδ βrιγγ ἱs μεν sαfελι ὁμε; βut δο whαt ἑ μιγht hε κοuλδ νοt sαvε hις μεν, fορ θει περισhεδ θροuγh θειρ ὀwν σhεερ fολλι ἰν ἐατιγγ θε καττλε ὀφ θε Σουν-γοδ Hιπεριον; σο θε γοδ πρεουεντεδ θεμ φρομ εουερ ῥεαχιγγ ὁμε. Τελλ με, τοο, ἀβοουτ ἀλλ θεσε θιγγς, O δαυχτερ οφ Dιοουε, φρομ ὀυχατσοεουερ σοουρκε ἰοου μει κνοου θεμ.

Syntax

alphabetify(text, alphabet, [lang, [pre, [post]]])

Parameters

text

The string of the original text which should be transliterated, in the source alphabet. It may be very long.

alphabet

The code string specifying the target alphabet:

Code Alphabet Quality
cyrl-ru Russian :star::star:
grek-el Modern Greek :star::star::star:
grek-grc Ancient Greek :star::star::star::star:
hira Japanese Hiragana :star:
kana Japanese Katakana :star:

lang optional

The code string specifying the original language. (Eurocentrically, the original alphabet is always Latin.)

Specifying the original language adds some minor language-specific rules. For example, in English the letter v will be processed in a similar way to the letter w, while in German v will be processed in a similar way to f.

If the original language is unspecified, English will be assumed.

Code Language
de German
en English

pre optional

The number 0 ≤ m ≤ (1 - n) specifying the proportion of text at the beginning of the text string which should not be transliterated at all. 0 by default.

post optional

The number 0 ≤ n ≤ (1 - m) specifying the proportion of text at the end of the text string which should be transliterated completely. 0 by default.

Return value

A promise, which on resolution returns the string with the increasingly transliterated text. You can process it by appending something like .then(output => process(output)).catch(error => throw error) to the function call.

Usage in browser

This module makes use of the fs module, which is available in Node JS, but not in the browser. For usage in the browser, use bundling with Webpack or an equivalent tool and have a look at the configuration in this repository in webpack.config.js, docs/webpack-entry.js and docs/index.js.

Development

Transliteration rules are converted from a short form (e. g., only involving lowercase letters) in alphabets/src/ to a long form in alphabets/build. This is done with the alphabets/preprocess module, which is run by npm run preprocess. The code, including the resulting long form rules, is bundled for web use by running npm run bundle.

Transliteration rules

This may be a bit abstract. Have a look into some of the JSON files in the alphabets folder to get a better understanding of the notation.

Short form

The short form files are found in the folder alphabets/src. If you would like to improve the rule sets, this is the place to look at. The files consist of a specification of the alphabet in regex terms (e. g., a-z for Latin), some optional macros (that is rules to be run on the rules) and of the list of rule blocks. Each rule block consists of rules in the short form.

The short form is a tuple [a, b, lang]:

Long form

The long form will be generated automatically from the short form by running npm run preprocess.

The long form is a pair [a, b] where alphabetify will apply text.replace(new RegExp(a, 'g'), b) to the input text for each pair, in the order of their appearance:

Note on transliteration

Feedback

The preferred way of contributing is via issues & pull requests in this repo. I also made some Reddit threads to get in feedback from non-coders: