Closed oddsson closed 8 months ago
Hey,
I'd love to hear more about the use case. Beygla declines Carlos as (nf Carlos, þf Carlos, þgf Carlosi, ef Carlosar) which seems right to me. Is that not correct? If so, why not?
I'm not implying that this feature is not valuable or that I don't want to support it, I'm just curious as to why disabling declension for foreign names is desirable.
Implementation wise, the main challenge is being able to determine whether a name is an Icelandic name or not. This requires encoding the set of Icelandic names and including it in the bundle.
Encoding the set of names will take up some kilobytes, so I would be hesitant to include it in the default beygla
export (I imagine it would at least double/triple the size of the library). However, we could add a "strict" version of the module:
import { applyCase } from "beygla/strict";
Where the beygla/strict
module contains the encoded set of names and does something like so:
import { applyCase as originalApplyCase } from "./beygla";
function isIcelandicName(name: string): boolean {
// ...
}
export function applyCase(...) {
// Some conditional behavior based on 'isIcelandicName'
}
Anyway, I'll explore how significantly we can compress the list of Icelandic names. It seems like a fun problem to solve.
Quick update: a naive trie encoding of the Icelandic name set list yields a size of ~10kB gzipped:
Created file 'names-ser.txt'
Size: 46.02 kB
Gzip size: 10.61 kB (23.05%)
Hey @oddsson,
I've created a PR that implements beygla/strict
(see #15). Would this implementation work for your use case?
PS: Feel free to review the PR if you've got the time!
Hey @alexharri, sorry for the radio silence 🤐
Thanks so much for acting on this. You are absolutely correct, beygla declines Carlos correctly. However, our use case is that we are using beygla in a project within the public sector. Our users care a lot about using grammatically correct Icelandic. They do not want to decline foreign names and since there is no way for us to determine the nationality behind a name, we decline everything. This means that our users manually "correct" foreign names after the fact and foreign names are really common within our system.
I'll try beygla strict within our project today or tomorrow and report back 🤝 Thanks again..
This would definitely work for our use case. If you are happy, I'd really like to see this merged so we can start using it 💯
@oddsson beygla@1.4.0
has been released to npm, let me know if you run into any issues!
Hi! 👋
Would it be possible to add an option to
applyCase
to control whether or not beygla tries to apply case to foreign names?Example
Current functionality:
applyCase('þgf', "Carlos") => Carlosi
Wanted functionality:applyCase('þgf', "Carlos", {applyToForeignNames: false}) => Carlos
Maybe we could do a lookup in /data/icelandic-names.csv if applyToForeignNames is set to false 🤷