alexharri / beygla

Tiny (5kB gzipped) declension helper for Icelandic names.
MIT License
28 stars 2 forks source link
declension icelandic language typescript utility

Beygla

Tiny (5kB gzipped) declension helper for Icelandic names

applyCase("ef", "Jóhannes");
//=> "Jóhannesar"

applyCase("þgf", "Helga Fríða Smáradóttir");
//=> "Helgu Fríðu Smáradóttur"

Overview


Why does beygla exist?

Icelandic names have four cases:

Guðmundur   →  Nominative case (nefnifall)
Guðmund     →  Accusative case (þolfall)
Guðmundi    →  Dative case (þágufall)
Guðmundar   →  Genitive case (eignarfall)

The different cases are used depending on the context in which the name is used.

Icelandic usernames are stored in the nominative case (nefnifall). This can pose a challenge when using the name in a sentence.

The document has been sent to Guðmundur

Translated to Icelandic, this reads:

Skjalið hefur verið sent á Guðmundur

To an Icelander, this is jarring. The name appears in the nominative case „Guðmundur“, but it should be in the accusative case „Guðmund“.

Rewritten to use the nominative case, we get:

Guðmundur hefur fengið skjalið sent

But we've now changed the message entirely!

BeforeAfter
> _The document has been sent to Guðmundur_ > _Guðmundur has received the document_

This forces an Icelandic content writer to degrade the user experience by either

By being able to decline (transform) names to the correct case, we would remove this problem entirely.

Unfortunately, Icelandic name declension has lots of rules, with lots of exceptions.

# Left is nominative case, right is accusative case

Gauti → Gauta
Jóhanna → Jóhönnu
Snæfríður → Snæfríði
Alex → Alex
Bjarnfreður → Bjarnfreð

Encoding these rules, and their exceptions, is hard and can take up a lot of space. Developers don't want to add hundreds of kilobytes to the bundle size, just to apply cases to names.

Well, beygla encodes these rules in just 5 kilobytes gzipped.[^*]

[^*]: Declension rules are encoded using cases for 3647 out of 4505 Icelandic names (81%). The data for the cases is from bin.arnastofnun.is.

Usage

Install beygla as an npm package:

npm i -S beygla

Beygla exports a single function named applyCase.

import { applyCase } from "beygla";

applyCase("ef", "Jóhannes");
//=> "Jóhannesar"

applyCase("þgf", "Helga Dís Smáradóttir");
//=> "Helgu Dís Smáradóttur"

applyCase accepts two parameters: a case and a name (in the nominative case[^nom]).

The return value is a string with the name declined to the desired case.

[^nom]: If the provided name is not in the nominative case, applyCase is likely to yield an unexpected value.

Cases

The following cases may be provided as the first argument to applyCase:

Case (English)  Case (Icelandic) Value (English) Value (Icelandic)
Nominative Nefnifall "nom" "nf"
Accusative Þolfall "acc" "þf"
Dative Þágufall "dat" "þgf"
Genitive Eignarfall "gen" "ef"

If a case not in the table above is provided, "nf" is used as a fallback (i.e. nothing is done).

Whitespace

If the name includes superfluous whitespace, `applyCase` removes it. ```tsx applyCase("þgf", " \n Helga Dís\tSmáradóttir \n\n"); //=> "Helgu Dís Smáradóttur" ```

Addresses

The `beygla/addresses` module allows you to apply declension to Icelandic addresses and place names: ```ts import { applyCase } from "beygla/addresses"; applyCase("þf", "Rauðalækur 63"); //=> "Rauðalæk 63" applyCase("ef", "Reykjavík"); //=> "Reykjavíkur" applyCase("þgf", "Þjórsárdalur"); //=> "Þjórsárdal" ``` Its behavior is the same as the regular `beygla` module, except it contains data that allows it to apply cases to Icelandic addresses and place names instead of person names. All of the same pattern matching behaviors and limitations apply. The `beygla/addresses` module is around 4.9kB gzipped.

Correctness

Beygla will correctly apply the desired case to the input name in most cases. Most Icelandic names (81%), especially common ones, are present on [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/). Beygla is guaranteed to produce a correct result for those names. This does not mean that Beygla produces an incorrect result for the other 19% of names. Beygla finds patterns in name endings based on the data on [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/) and applies those patterns to any input name. This means that beygla will produce a correct result for most names, even if the name is not in the dataset from [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/). I tried randomly sampling 20 names from the list of legal Icelandic names not present in [bin.arnastofnun.is](https://bin.arnastofnun.is/gogn/): * 14 names matched a pattern with the correct result * 6 names matched no pattern * 0 names matched a pattern with an incorrect result Even though I happened to get no incorrect results, this is a very small sample. I'm absolutely certain that there are a handful of names that will produce incorrect results. See [beygla.spec.ts](https://github.com/alexharri/beygla/blob/master/lib/beygla.spec.ts).

Strict mode

Beygla provides a "strict version" accessible under `beygla/strict` which guarantees that declensions are only be applied to legal Icelandic names. ```tsx import { applyCase } from "beygla/strict"; ``` The interface for `beygla/strict` is the exact same as for `beygla`. Only declining Icelandic names may not be desirable when a correct declensions is not to applied to a foreign name. The `beygla/strict` module is also 15kB gzipped, which is three times larger than the standard `beygla` module.

Passing a name in the wrong case

Beygla operates on the assumption that names provided to it are in the nominative case (nefnifall). If a name provided to beygla is in another case than nominative, an incorrect result is extremely likely.

What happens if beygla does not find a pattern?

Given a name that has an ending that beygla does not recognize, it will not apply the case to the name. Do note that beygla attempts to apply the case to every name (first, last, and middle name) in a full name individually. This means that some names in a full name might have a case applied, and some not.