Closed JacobWeinbren closed 1 year ago
Who says shewa is silent in that case? Maybe in Ivrit but not in Biblical Hebrew.
On Wed, 17 May 2023 at 07:01, Jacob Weinbren @.***> wrote:
In this library, the Shewa is substituted for a ǝ - when in some (most) cases it should be silent.
For example, "בְּרוּכָה הַבָּאָה" produces "bǝrûkâ habbāʾâ". when it should be "brûkâ habbāʾâ"
Is there any clear way to solve this?
Thank you for making this library.
— Reply to this email directly, view it on GitHub https://github.com/charlesLoder/hebrew-transliteration/issues/70, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD44GHTTA4VFHJVPKFCP53DXGRLUDANCNFSM6AAAAAAYERFVBU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@JacobWeinbren
I'm glad you're finding the library useful!
As @johnlockejrr noted in Biblical Hebrew the sheva in בְּרוּכָה is most certainly vocal.
In Modern Hebrew, however, the sheva would be silent, so the bet and resh would be blended together like in English.
For your case, you could set the VOCAL_SHEVA to an empty string:
transliterate("בְּרוּכָה", { VOCAL_SHEVA: "" })
// brûkâ
This can be done on the website's interface as well.
This is probably you're best bet.
Though that solves your immediate use case, it will still give some incorrect results with a few words. For a word like לְבָנִים, even though it has an initial sheva like בְּרוּכָה, in Modern Hebrew, unlike בְּרוּכָה, the sheva is pronounced — lǝvanim.
If you're curious, there's more here.
Hope that helps
@charlesLoder
Is there any way to address the other cases for modern Ivrit? How would I use Syllable.medial to achieve this?
https://judaism.stackexchange.com/questions/92599/what-are-the-rules-for-shva-na
- first letter of a word.
- second of two shvas under consecutive letters.
- after a tenua gedolah(long vowel), where the long vowel has no primary stress.(also some other cases, and can even be after a short vowel where the short vowel is "lengthened"/treated as a long vowel!)
- under a dagesh
- if a shva appears under the first of two consecutive identical letters (e.g., the first lamed of halleluyah)
This is the closest I could find to a comprehensive ruleset for all the cases where sheva is pronounced.
Thank you!
The rules in that stackexchange are for Biblical Hebrew. The first rule would apply to בְּרוּכָה making it a vocal sheva. The package already follows those rules by default, but the can be adjusted as well.
Using the JS package (it won't work well on the website), you can write an ADDITIONAL_FEATURE
using a callback.
Coffin and Bolozky give a pretty detailed analysis of Modern Hebrew syllable structure in their work A Reference Grammar of Modern Hebrew
@charlesLoder I am not confident enough in Hebrew to write a callback that would apply to most of the rules and exceptions of modern Ivrit to the vocal Shewa. Say we made the rules for long vowels, short vowels, first letters, identical letters, dagesh...there are so many potential exceptions. If I understand you correctly, this seems a significant roadblock to transcribing modern Ivrit.
@charlesLoder Using the Additional Features, would something like this be what you are looking for?
It addresses the three rules outlined in A Reference Grammer of Modern Hebrew. But what about ve - e.g. וְחָלָב? Under these rules, it would be silent?
const heb = require("hebrew-transliteration");
const brillSimple = require("hebrew-transliteration/schemas").brillSimple;
let text = "גְּדֹולִים";
let transliterated = heb.transliterate(text, brillSimple, {
ADDITIONAL_FEATURES: [
{
FEATURE: "syllable",
HEBREW: ".*\u{05B0}.*", // matches any shewa in a syllable
TRANSLITERATION: function (syllable, _hebrew, schema) {
console.log("test");
// If the shva should be vocal
if (
["י", "ל", "מ", "נ", "ר"].includes(syllable.text[0]) ||
["א", "ה", "ע"].includes(syllable.text[1])
) {
return syllable.text.replace(
new RegExp("\u{05B0}", "u"),
schema["VOCAL_SHEVA"]
);
}
// If none of the conditions are met, the shva is silent
return syllable.text.replace(new RegExp("\u{05B0}", "u"), "");
},
},
{
FEATURE: "word",
HEBREW: "\u{05B0}.*\u{05B0}", // matches any shewa that is preceded by another shewa in a word
TRANSLITERATION: function (word, _hebrew, schema) {
// Replace each shewa that is preceded by another shewa with a vocal shewa
return word.text.replace(
new RegExp("\u{05B0}.*\u{05B0}", "u"),
schema["VOCAL_SHEVA"]
);
},
},
],
});
console.log(transliterated);
@JacobWeinbren
Here is a solution:
const str = "גְּדֹולִים לְבָנִים תְּשׁוּקָה תְּאוּנָה";
console.log(
heb.transliterate(str, {
longVowels: false,
ADDITIONAL_FEATURES: [
{
FEATURE: "syllable",
HEBREW: "\u{05B0}",
TRANSLITERATION: (syllable, _hebrew, schema) => {
// if vowel is not a shewa then shewa is silent, so skip
if (syllable.vowel !== "\u{05B0}") return syllable.text;
// if syllable vowel is a shewa and contains one of the following, then the shewa is vocal
if (["י", "ל", "מ", "נ", "ר"].includes(syllable.onset)) {
return syllable.text;
}
// if the syllable vowel is a shewa and the next syllable contains a guttural, then the shewa is vocal
const next = syllable?.next?.value.onset;
if (next && ["א", "ה", "ע"].includes(next)) {
return syllable.text;
}
// else the shewa is silent
return syllable.text.replace("\u{05B0}", "");
}
}
]
})
);
// gdôlîm lǝbānîm tšûqâ tǝʾûnâ
As for an initial vav, thought the rules above would indicate that it is silent, I'm pretty sure it should be vocal.
Somewhere else you asked about — הוֹלְכִים. In Modern Israeli, a shewa after a long vowel is usually silent (like in the phrase "eykh omrim"). Theoretically, the longVowels
option set to false
should work, but there is a bug I need to fix
@charlesLoder
Thanks a lot!
One fix to your code
const next = syllable?.next?.value?.onset;
(if the syllable is the last one it checks first).
Could we write in an edge case if the word begins with ve (וְ) 'and'? It is one of the most common lexemes (I'm sure you knew this already, apologies). Something like:
{
FEATURE: "word",
HEBREW: "^וְ",
TRANSLITERATION: function (word, _hebrew, schema) {
return word.text.replace("\u{05B0}", schema["VOCAL_SHEVA"]);
},
},
Also, Is there any way I can help you with fixing the longVowels option?
EDIT: I see what you mean - יַלְדָּה produces ylada EDIT 2: The third rule of shva, 'A shewa preceded by a shewa is typically vocal as well' doesn't work for מֵאַרְצְךָ
Thanks again!
The longVowel option is part of the havarotjs package. I still need to make the issue for it.
(if the syllable is the last one it checks first).
Not sure I follow.
Here's an updated solution:
const str = "וְמֶלֶךְ יַלְדָּה גְּדֹולִים לְבָנִים תְּשׁוּקָה תְּאוּנָה";
console.log(
heb.transliterate(str, {
longVowels: false,
ADDITIONAL_FEATURES: [
{
FEATURE: "syllable",
HEBREW: /^\D{1,2}\u{05B0}/,
TRANSLITERATION: (syllable, _hebrew, schema) => {
// if vowel is not a shewa then shewa is silent, so skip
if (syllable.vowel !== "\u{05B0}") {
return syllable.text;
}
// if it is the first syallble and is conjunctive vav, then the shewa is vocal
const isFirstSyllable = syllable.prev ? false : true;
if (isFirstSyllable && syllable.text.includes("וְ")) {
return syllable.text;
}
// if syllable vowel is a shewa and contains one of the following, then the shewa is vocal
if (["י", "ל", "מ", "נ", "ר"].includes()) {
return syllable.text;
}
// if the syllable vowel is a shewa and the next syllable contains a guttural, then the shewa is vocal
const next = syllable?.next?.value.onset;
if (next && ["א", "ה", "ע"].includes(next)) {
return syllable.text;
}
// else the shewa is silent
return syllable.text.replace("\u{05B0}", "");
}
}
]
})
);
// wǝmelek yaldâ gdôlîm lbānîm tšûqâ tǝʾûnâ
Note the updated regex — /^\D{1,2}\u{05B0}/
You can fiddle around with it.
You can also extend a premade schema with destructuring
heb.transliterate(str, {
...brillSimple,
ADDITIONAL_FEATURES: [
// all the stuff
]
});
Thank you for this. Regarding the longvowel if you make it an issue, I will try my best to help. How much does it come up in Hebrew do you think?
const next = syllable?.next?.value.onset;
^
TypeError: Cannot read properties of undefined (reading 'onset')
at TRANSLITERATION (/Users/jacobweinbren/Desktop/test/test.js:32:40)
at sylRules (/Users/jacobweinbren/Desktop/test/node_modules/hebrew-transliteration/dist/rules.js:274:70)
at /Users/jacobweinbren/Desktop/test/node_modules/hebrew-transliteration/dist/transliterate.js:83:50
at Array.map (<anonymous>)
at /Users/jacobweinbren/Desktop/test/node_modules/hebrew-transliteration/dist/transliterate.js:83:18
at Array.map (<anonymous>)
at Object.transliterate (/Users/jacobweinbren/Desktop/test/node_modules/hebrew-transliteration/dist/transliterate.js:78:10)
at Object.<anonymous> (/Users/jacobweinbren/Desktop/test/test.js:7:6)
at Module._compile (node:internal/modules/cjs/loader:1159:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
This is the error I get without adding the question mark. I think it is an error with the last syllable (which does not have a next).
Without going on too much of a tangent from the original issue (which you have very kindly solved), some exceptions still do not work due to the third rule, such as:
מִשְׁפְּחֹת = mishpechot
To solve this, I implemented this...hopefully this is ok?
{
FEATURE: "word", // Apply rule at the word level
HEBREW: /\u{05B0}/, // Match Shva sign anywhere in word
TRANSLITERATION: function (word, _hebrew, schema) {
// Regular expression to match two consecutive Shvas with possible intervening diacritics
const regex = /([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})/gu;
// Replace each match with first and third group followed by a vocal Shva
const replaced = word.text.replace(regex, `$1$3${schema["VOCAL_SHEVA"]}`);
return replaced; // Return transliterated word
},
},
@charlesLoder We can use syllable.prev to make the shva after the long vowel (hireq, qamets, tsere, segol, qibbuts, and shuruq) silent.
{
FEATURE: "syllable", // Work on the syllable level
HEBREW: /\u{05B0}/, // Match Shva in a syllable
TRANSLITERATION: function (syllable, _hebrew, schema) {
// If Shva follows a long vowel, make it silent
if (syllable.prev && ["\u{05B4}", "\u{05B8}", "\u{05B5}", "\u{05B9}", "\u{05BB}"].includes(syllable.prev.vowel) && syllable.vowel === "\u{05B0}") {
return syllable.text.replace(/\u{05B0}/u, '');
}
// Make second of consecutive Shvas vocal
const replaced = syllable.text.replace(/([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})/gu, `$1$3${schema["VOCAL_SHEVA"]}`);
return replaced;
},
},
@charlesLoder Here is the full code to handle modern Ivrit. Thanks so much for helping!!
EDIT: This still doesn't work for מַה שְּׁלוֹמֵךְ, not sure why.
const heb = require("hebrew-transliteration");
const brillSimple = require("hebrew-transliteration/schemas").brillSimple;
// Arrays of characters for different conditions
const longVowels = ["\u{05B4}", "\u{05B8}", "\u{05B5}", "\u{05B9}", "\u{05BB}"];
const gutturals = ["א", "ה", "ע"];
const vocalChars = ["י", "ל", "מ", "נ", "ר"];
const schema = {
...brillSimple,
longVowels: false,
ADDITIONAL_FEATURES: [
{
FEATURE: "word",
// Regular expression to match Shva in a word
HEBREW: /\D{1,2}\u{05B0}/,
// Replace consecutive Shvas with a vocal Shva
TRANSLITERATION: (word, _, schema) =>
word.text.replace(
/([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})/gu,
`$1$3${schema["VOCAL_SHEVA"]}`
),
},
{
FEATURE: "syllable",
// Regular expression to match Shva at the start of a syllable
HEBREW: /^\D{1,2}\u{05B0}/,
// Handle rules of Shva in a syllable
TRANSLITERATION: (syllable, _, schema) => {
const isFirstSyllable = !syllable.prev;
const nextOnset = syllable?.next?.value?.onset;
// If Shva follows a long vowel, make it silent
if (
syllable.vowel === "\u{05B0}" &&
syllable.prev &&
longVowels.includes(syllable.prev.vowel)
) {
return syllable.text.replace(/\u{05B0}/u, "");
}
// If Shva is in the first syllable and is a conjunctive vav, make it vocal
if (isFirstSyllable && syllable.text.includes("וְ")) {
return syllable.text;
}
// If Shva follows one of the characters in vocalChars, make it vocal
if (vocalChars.includes(syllable.text.charAt(0))) {
return syllable.text;
}
// If the next syllable contains a guttural, make Shva vocal
if (nextOnset && gutturals.includes(nextOnset)) {
return syllable.text;
}
// If none of the above conditions are met, return the original syllable text
return syllable.text;
},
},
],
};
console.log(heb.transliterate("בְּרוּכָה הַבָּאָה", schema));
console.log(heb.transliterate("הוֹלְכִים", schema));
console.log(heb.transliterate("יַלְדָּה", schema));
console.log(heb.transliterate("מִשְׁפְּחֹת", schema));
console.log(heb.transliterate("וְאָנִי", schema));
Produces
brukha habba’a
holkhim
yalda
mishpᵉḥot
vᵉ’ani
I will try and work on this more tomorrow.
@charlesLoder I don't understand why this causes the letters to switch in yalda
{
FEATURE: "syllable",
// Regular expression to match Shva in a word
HEBREW: /\D{1,2}\u{05B0}/,
// Replace consecutive Shvas with a vocal Shva
TRANSLITERATION: (syllable, _, schema) => {
return syllable.text;
},
},
It makes it hard to fix the מַה שְּׁלוֹמֵךְ problem
I don't understand why this causes the letters to switch in yalda
Normally, that issue arises from this block
It's just a poor design choice I made that needs to be refactored.
This is the error I get without adding the question mark. I think it is an error with the last syllable (which does not have a next).
You'll want to have the question marks for optional chaining; they're useful to see if the value exists or not.
@charlesLoder I can't find any obvious solution other than reversing clusters, but this causes even more problems.
With the final return changed to return syllable.text.replace(/\u{05B0}/u, "");
I am getting this
brukha habba’a // correct holkhim // correct ylada // should be: yalda mishpᵉḥot // correct vᵉ’ani // correct mah shlomkhe // should be: mah shlomekh
Thanks for posting the longVowels issue, I will take a look.
@charlesLoder Managed to fix the word ordering bug and potentially the long vowel issue you raised - but אַנְגְּלִית produces issues. Not sure what I can do about these exceptions, or is there a rule I am missing?
@JacobWeinbren
What's your schema look like now?
@charlesLoder It looks like this - thanks a lot!
const heb = require("hebrew-transliteration");
const brillSimple = require("hebrew-transliteration/schemas").brillSimple;
// Arrays of characters for different conditions
const longVowels = ["\u{05B4}", "\u{05B8}", "\u{05B5}", "\u{05B9}", "\u{05BB}"];
const gutturals = ["א", "ה", "ע"];
const vocalChars = ["י", "ל", "מ", "נ", "ר"];
const schema = {
...brillSimple,
longVowels: false,
ADDITIONAL_FEATURES: [
{
FEATURE: "word",
// Regular expression to match Shva in a word
HEBREW: /\D{1,2}\u{05B0}/,
// Replace consecutive Shvas with a vocal Shva
TRANSLITERATION: (word, _, schema) =>
word.text.replace(
/([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})/gu,
`$1$3${schema["VOCAL_SHEVA"]}`
),
},
{
FEATURE: "syllable",
// Regular expression to match Shva at the start of a syllable
HEBREW: /\D{1,2}\u{05B0}/,
// Handle rules of Shva in a syllable
TRANSLITERATION: (syllable, _, schema) => {
const isFirstSyllable = !syllable.prev;
const nextOnset = syllable?.next?.value?.onset;
if (
// If Shva follows a long vowel, make it silent
syllable.prev &&
longVowels.includes(syllable.prev.vowel)
) {
return syllable.text.replace(/\u{05B0}/u, "");
}
if (
// If Shva is in the first syllable and is a conjunctive vav, make it vocal
(isFirstSyllable && syllable.text.includes("וְ")) ||
// If Shva follows one of the characters in vocalChars, make it vocal
vocalChars.includes(syllable.text.charAt(0)) ||
// If the next syllable contains a guttural, make Shva vocal
(nextOnset && gutturals.includes(nextOnset))
) {
return syllable.text;
}
return syllable.text.replace(/\u{05B0}/u, "");
},
},
],
};
console.log(heb.transliterate("בְּרוּכָה הַבָּאָה", schema));
console.log(heb.transliterate("הוֹלְכִים", schema));
console.log(heb.transliterate("יַלְדָּה", schema));
console.log(heb.transliterate("מִשְׁפְּחֹת", schema));
console.log(heb.transliterate("וְאָנִי", schema));
console.log(heb.transliterate("מַה שְּׁלוֹמֵךְ", schema));
Produces
brukha habba’a
holᵉkhim //wrong
yalda
mishpᵉḥot
vᵉ’ani
mah shlomekh
@charlesLoder One of the issues with long vowels is the other kinds of long vowels that aren't just below letters. הוֹלְכִים has a vav with a dot on top. In havarotjs, syllable.vowel says the vowel is null, and the letter is a vav. Which is why it isn't being picked up.
In Hebrew, certain consonants ("א", "ה", "ו", "י") can serve as matres lectionis, that is, they can represent vowel sounds. The "ו" with a dot above it (Holam Male) represents a long "o" sound.
Using the schema here, you can adjust the regex a little more:
- HEBREW: /\D{1,2}\u{05B0}/,
+ HEBREW: /^\D{1}[\u{05C1}|\u{05C2}]?\D{1}\u{05B0}/,
These regexes can get a little complicated.
In havarotjs, syllable.vowel says the vowel is null, and the letter is a vav. Which is why it isn't being picked up.
OK, I'll look into that a little more.
At some point, I need to sit down with both of these projects and really consider some good changes.
@charlesLoder Also, thanks a lot for continuing to help!!
EDIT: /^\D{1}[\u{05C1}|\u{05C2}]?\D{1}\u{05B0}/ seems to break the schema?
@charlesLoder Not to go full circle. I removed the letter-switching change I made, and added detection for holam male. It works for holkhim. But yalda is still acting up.
EDIT: And when I add the fix, holkhim goes back to being a vocal shva (because clusters break), but yalda works.
@charlesLoder Finally can get some sleep.
Ok so the pull request I have tested and it works. The issue with the long vowels? You can see it being tested here./^\D{1}[\u{05C1}|\u{05C2}]?\D{1}\u{05B0}/
seems to breaks things but if you can show me how to make it work properly - I'm open ears. Not 100% sure why we are testing for shin or sin? אַנְגְּלִית also doesn't work - but you'd expect that because it breaks the two shva's in a row rule.
Anyway, as a novice, I am super pleased with this. Thanks a lot!
const heb = require("hebrew-transliteration");
const brillSimple = require("hebrew-transliteration/schemas").brillSimple;
// Arrays of characters for different conditions
const gutturals = ["א", "ה", "ע"];
const vocalChars = ["י", "ל", "מ", "נ", "ר"];
const schema = {
...brillSimple,
longVowels: false,
ADDITIONAL_FEATURES: [
{
FEATURE: "word",
// Regular expression to match Shva in a word
HEBREW: /\D{1,2}\u{05B0}/,
// Replace consecutive Shvas with a vocal Shva
TRANSLITERATION: (word, _, schema) =>
word.text.replace(
/([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})([\u{05D0}-\u{05EA}][\u{0591}-\u{05C7}]*)(\u{05B0})/gu,
`$1$3${schema["VOCAL_SHEVA"]}`
),
},
{
FEATURE: "syllable",
// Regular expression to match Shva at the start of a syllable
HEBREW: /\D{1,2}\u{05B0}/,
// Handle rules of Shva in a syllable
TRANSLITERATION: (syllable, _, schema) => {
const isFirstSyllable = !syllable.prev;
const nextOnset = syllable?.next?.value?.onset;
if (
// If Shva follows a long vowel, make it silent
syllable.prev &&
/[\u{05B5}\u{05B8}\u{05B9}\u{05BA}\u05B9]/u.test(
syllable.prev.text
)
) {
return syllable.text.replace(/\u{05B0}/u, "");
}
if (
// If Shva is in the first syllable and is a conjunctive vav, make it vocal
(isFirstSyllable && syllable.text.includes("וְ")) ||
// If Shva follows one of the characters in vocalChars, make it vocal
vocalChars.includes(syllable.text.charAt(0)) ||
// If the next syllable contains a guttural, make Shva vocal
(nextOnset && gutturals.includes(nextOnset))
) {
return syllable.text;
}
return syllable.text.replace(/\u{05B0}/u, "");
},
},
],
};
console.log(heb.transliterate("בְּרוּכָה הַבָּאָה", schema));
console.log(heb.transliterate("הוֹלְכִים", schema));
console.log(heb.transliterate("יַלְדָּה", schema));
console.log(heb.transliterate("מִשְׁפְּחֹת", schema));
console.log(heb.transliterate("וְאָנִי", schema));
console.log(heb.transliterate("מַה שְּׁלוֹמֵךְ", schema));
Returns
brukha habba’a
holkhim
yalda
mishpᵉḥot
vᵉ’ani
mah shlomekh
In this library, the Shewa is substituted for a ǝ - when in some (most) cases it should be silent.
For example, "בְּרוּכָה הַבָּאָה" produces "bǝrûkâ habbāʾâ". when it should be "brûkâ habbāʾâ"
Is there any clear way to solve this?
Thank you for making this library.