charlesLoder / hebrew-transliteration

A tool for transliterating Hebrew
https://www.npmjs.com/package/hebrew-transliteration
MIT License
37 stars 14 forks source link

double marks #61

Closed asherlporetz closed 1 year ago

asherlporetz commented 1 year ago

Not sure of settings but the following fail if inserted in the tests. Text from Sefaria. produces double dagesh mark instead:

    ${"sin dagesh "}   | ${"הַשָּׂדֶֽה"} | ${"haśādê"}   | ${{ DAGESH_CHAZAQ: "\u0301" }}

produces double stress mark instead:

    ${"geresh"}   | ${"עֵ֝ינֶ֗יךָ"}   | ${"ʿênêˈkā"}   | ${{ STRESS_MARKER: { location: "after-syllable", mark: "ˈ" } }}

produces "ha" instead of "ah":

    ${"furtive patach, sof pasuq"}   | ${"רֽוּחַ׃"}    | ${"rûaḥ"}

does not separate maqaf:

    ${"psalms 2:12 maqaf"}  |  ${"נַשְּׁקוּ־בַ֡ר"}  |  ${"naššǝqû-bar"}

By the way is it possible to have SILENT_SHEVA and MAPPIQ settings (default to blank strings)? For example "שַׁוְעִ֗י" can become "shavi" if silent sheva is not marked, instead of "shav,i". And "הִ֛וא" occurs often enough that it would be great to have a setting for it instead of the cpu-consuming ADDITIONAL_FEATURES; right now it translates to "hiv'" instead of just "hi". Thank you for this project!

charlesLoder commented 1 year ago

@asherlporetz

Just noting that I see this.

Had a baby a few months ago, and time on this project has slowed

asherlporetz commented 1 year ago

No rush. The library is very useful as it is. Have a great time!

charlesLoder commented 1 year ago

Found some free time!

I assume these are your own tests?

I'll try to take them one-by-one




produces double dagesh mark instead:

The \u0301 character is being applied to ś, which already has an acute on it.

You would have to write an ADDITIONAL_FEATURE to fix that.



produces double stress mark instead:

It's counting the geresh and the revia as stressed syllables.

What text is that from? That may be a bug, or my limited understanding of taamim (almost certainly the latter).



produces "ha" instead of "ah":

Interesting, if you copy & paste from a text like Mishneh Torah, the sof pasuq after רוּחַ is actually a colon, but in Psalms 18:11 it's an actual sof pasuq.

This is definitely a bug 🐞



does not separate maqaf:

Another bug! 🐞



By the way is it possible to have SILENT_SHEVA and MAPPIQ settings (default to blank strings)?

I like this idea, but not totally sure I follow the first example.

example "שַׁוְעִ֗י" can become "shavi" if silent sheva is not marked, instead of "shav,i".

With the default settings, I get:

console.log(heb.transliterate(`שַׁוְעִ֗י`));
// šawʿî

The ʿ character is the ayin. Is that what you mean in "shav,i"?

And "הִ֛וא" occurs often enough that it would be great to have a setting for it instead of the cpu-consuming ADDITIONAL_FEATURES;

Yeah, the ADDITIONAL_FEATURES are not optimized at all.

I'm not too sure how to check a word like this other than just a regex, which would be pretty similar to an ADDITIONAL_FEATURE.

I'll have to mull over it. If you have any ideas, drop them here.

charlesLoder commented 1 year ago

Closes this. New issues have been created