Closes #14

What

Adds a new strict version of beygla, which is accessed under beygla/strict:

import { applyCase } from "beygla/strict";

There are two main differences between beygla and beygla/strict:

beygla declines all names it finds a pattern for. beygla/strict only declines Icelandic names (as specified in icelandic-names.csv).
beygla is <5kB while beygla/strict is <15kB.

The reason for the 3x size increase is that the beygla/strict version encodes all legal Icelandic names and bundles them in the library.

Because only known names are declined in beygla/strict, the declensions are guaranteed to be correct. The tradeoff, aside from the bundle size, is that correct declensions for non-Icelandic names are not applied.

How

Name encoding

The set of Icelandic names is encoded in a single large string. The string contains a trie-encoding that works like so:

Initialize an empty stack of characters.
For each character in the string:
- If . is encountered, the current stack represents an Icelandic name.
- If < is encountered, pop the last character from the stack.
- If any other character is encountered, append it to the stack.

Here's an example of names encoded using this method:

ás.t.vald.ur.<<<<r.<<in.<<eig.<<<<björn.

This encodes the following names:

Ás
Ást
Ástvald
Ástvaldur
Ástvar
Ástvin
Ástveig
Ástbjörn

I tried various compression methods such as:

Pack the bits of the 5/6 bit characters into bytes (Icelandic characters can be encoded using 6 bits, or 5 if you add a separate character to denote accented characters).
Compress long <<<< sequences into numbers e.g. <<<< becomes 4.
Use uppercase to denote the end of a string e.g. ás.t.vald.ur. becomes áSTvalDuR.

All of these methods reduced the size of the string. However, each of them made gzip compression less effective and resulted in a net size increase. For that reason we stick with the super-simple encoding.

Add `setPredicate` to `beygla`

To avoid polluting the interface of applyCase, beygla exposes a new undocumented setPredicate export that can be used to provide a predicate that determines whether or not a name is declined.

beygla/strict uses this by providing a predicate and re-exporting beygla:

import { setPredicate } from "./beygla";

function isIcelandicName(name: string): boolean {
  // ...
}
setPredicate(isIcelandicName);

export * from "./beygla";

This guarantenes that the API for beygla and beygla/strict stays the same.

Drive-by

Handle multiple name categories for single entry in BÍN data

There are 3 entries in the BÍN data containing multiple word categories, one of which added since beygla was last updated.

Instead of filtering them out, as is currently done, we now treat multiple categories for a single entry as valid.

alexharri / beygla

Add 'beygla/strict' module #15

What

How

Name encoding

Add `setPredicate` to `beygla`

Drive-by

Handle multiple name categories for single entry in BÍN data

alexharri / beygla

Add 'beygla/strict' module #15

What

How

Name encoding

Add setPredicate to beygla

Drive-by

Handle multiple name categories for single entry in BÍN data

Add `setPredicate` to `beygla`