charlesLoder / hebrew-transliteration

A tool for transliterating Hebrew
https://www.npmjs.com/package/hebrew-transliteration
MIT License
37 stars 14 forks source link

Add `PASS_THROUGH` option #53

Closed charlesLoder closed 1 year ago

charlesLoder commented 1 year ago

For an ADDITIONAL_FEATURE, writing the callback can get messy:

const heb = require("./dist/index");
const rules = require("./dist/rules");

const result = heb.transliterate("בְּרֵאשִׁ֖ית וַיַּבְדֵּל", {
  ADDITIONAL_FEATURES: [
    {
      // matches any sheva in a syllable that is NOT preceded by a vowel character
      HEBREW: "(?<![\u{05B1}-\u{05BB}\u{05C7}].*)\u{05B0}",
      FEATURE: "syllable",
      TRANSLITERATION: function (syllable, _hebrew, schema) {
        const next = syllable.next;
        // discrepancy here: in havarotjs SHEVA is simply the character
        // whereas transliteration is concerned with a specific sheva, a vocal sheva
        const nextVowel = next.vowelName === "SHEVA" ? "VOCAL_SHEVA" : next.vowelName;

        if (next && nextVowel) {
          const vowel = schema[nextVowel] || "";
          // replaceAndTransliterate is an internal helper function
          return rules.replaceAndTransliterate(syllable.text, new RegExp("\u{05B0}", "u"), vowel, schema);
        }

        return syllable.text;
      }
    }
  ]
});

// bērēʾšît wayyabdēl

Namely, you have to imprt a rule and use it.

The PASS_THROUGH option could work like this:

const result = heb.transliterate("בְּרֵאשִׁ֖ית וַיַּבְדֵּל", {
  ADDITIONAL_FEATURES: [
    {
      // matches any sheva in a syllable that is NOT preceded by a vowel character
      HEBREW: "(?<![\u{05B1}-\u{05BB}\u{05C7}].*)\u{05B0}",
      FEATURE: "syllable",
      PASS_THROUGH: true,
      TRANSLITERATION: function (syllable, _hebrew, schema) {
        const next = syllable.next;
        // discrepancy here: in havarotjs SHEVA is simply the character
        // whereas transliteration is concerned with a specific sheva, a vocal sheva
        const nextVowel = next.vowelName === "SHEVA" ? "VOCAL_SHEVA" : next.vowelName;

        if (next && nextVowel) {
          const vowel = schema[nextVowel] || "";
          return syllable.text.replace(new RegExp("\u{05B0}", "u"), vowel);
        }

        return syllable.text;
      }
    }
  ]
});

This way no import is used, and it can continue to map characters in the rules as usual. No need to implement existing logic.

johnlockejrr commented 1 year ago

bērēʾšît wayyabdēl would be wrong because shewa is a short vowel and the b in the second word is spirantizated to v.

charlesLoder commented 1 year ago

bērēʾšît wayyabdēl would be wrong because shewa is a short vowel and the b in the second word is spirantizated to v.

In standard academic SBL transliteration, the default is actually to not spirantize unless necessary — SBLHS §5.1.1.4 — it does seem counter intuitive though

charlesLoder commented 1 year ago

The other impetus for this, is that on the site a schema can be downloaded as a json file. If there is an import statement on top, then it's not valid json.