charlesLoder / hebrew-transliteration

A tool for transliterating Hebrew
https://www.npmjs.com/package/hebrew-transliteration
MIT License
37 stars 14 forks source link

`ADDITIONAL_FEATURES` leaving stray shin/sin dot characters #67

Closed m-yac closed 1 year ago

m-yac commented 1 year ago

I was trying to create an ADDITIONAL_FEATURES entry which changes a final patah-yod or qamats-yod to "ai" and came up with:

{ FEATURE: "word",
  HEBREW: "([\u{05B7}\u{05B8}])י$",
  TRANSLITERATION: "$1i" },

However, this seems to have strange effects on the rest of the word when it gets applied.

The thing that is definitely an error is that when this rule is applied, any shin/sin dot characters are left in the final string! It's hard to see the tiny dot at first, but for example:

transliterate("שַׁדַּי", my_schema) === "shׁadai"
transliterate("שַׁדַּי", my_schema).charCodeAt(2).toString(16) === "5c1" // SHIN_DOT

However, as I continued to experiment, I noticed that when this rule is applied, the remaining word is often transliterated completely incorrectly in a number of other ways: all dageshes are ignored, all shevas are vocal, all shureqs are vavs, all other ADDITIONAL_FEATURES are ignored, and probably more. As an quick demonstration of this with nonsense words:

transliterate("בַּי", my_schema) === "vai"
transliterate("גַרְגַי", my_schema) === "garᵉgai
transliterate("קוּמַי", my_schema) === "kwmai"

However, is it possible these latter observations are to be expected? Is it the case that the "word" feature is really only meant to be used for whole-word transliteration? If so, then this nonsensical behavior is okay, since I'm doing something that's isn't supposed to be done. If this is the case though, how would I write the rule I'm looking for?

charlesLoder commented 1 year ago

Is it the case that the "word" feature is really only meant to be used for whole-word transliteration?

Exactly, I use it for something like this:

transliterate("אֶל־מֹשֶׁ֣ה", {
  ADDITIONAL_FEATURES: [
    {
      FEATURE: "word",
      HEBREW: "מֹשֶׁה",
      TRANSLITERATION: "Moses"
    }
  ]
})
// ʾel-Moses

For your case, I would use "syllable"

heb.transliterate("שַׁדַּי", {
  SHIN: "sh",
  ADDITIONAL_FEATURES: [
    {
      FEATURE: "syllable",
      HEBREW: "([\u{05B7}\u{05B8}])י$",
      TRANSLITERATION: "$1i"
    }
  ]
})
// shadai

Let me know how that works