Jerusalem: phantom yod - Githubissues

charlesLoder commented 4 months ago

See comment from @bdenckla

Regarding yerushalayim and yerushalaymah, you have the opportunity to do something I never had the guts to do, which is to introduce the notion of a "phantom yod" to hold those orphan vowel marks (ḥiriq or sheva). You can get rid of about 600 cases of two vowels on a single letter that way. At the cost of introducing this "phantom yod" abstraction of course. But it might be a good trade-off.

Here's a cool feature that would pretty much just "fall out" of this representation for free: the option to show this ketiv/qere explicitly instead of implicitly. See, for example, the treatment of yerushalayim in the recent JPS commentary on Psalms 120-150, e.g.:

Ben, could you comment a little more on how you imagine that would work?

Some of my stray thoughts:

generally, I try to keep the text as "as is" with a few minor modifications. Jerusalem is definitely one of the largest edge cases so I'm not against handling it better
every class has a .original property for preserving the actual text that was used as input
would the .text property return the text with an inserted yod?

bdenckla commented 4 months ago

There may be no need for my "phantom yod" idea if there is a general feature that allows a pointed qere (.text) to sometimes be present, when it needs to differ from the pointed ketiv (.original). (Or, equivalently, .text is always present, but only sometimes differs from .original.)

If you had such a feature, I imagine it would apply to not only the 600 or so cases of Jerusalem-related words, but also perhaps to other tricky cases we've discussed elsewhere:

the huge number of cases of YHVH-related words
- adonai-spelled-YHVH
- elohim-spelled-YHVH
hi-spelled-הוא (only in Torah?!)
the rare cases involving proper nouns (in particular, personal names):
- Yissakhar
- Yiriyah
- Meḥiyael
one weird case in Deut. 32:6 where there is (in some editions) a space between the ha and ladonai of what (I speculate) should be read as haladonai. I.e. a single word with a space in it, if that's not too contradictory a concept! (Not totally unprecedented in European names, since they can have a "nobiliary particle". For instance "von" is not the middle name of Georg von Trapp.)

On the other hand, the generality of this .text/.original representation can be viewed as a weakness not a strength. The weakness is that it doesn't explicitly represent the difference between the ketiv and the qere. Of course, that difference can be automatically derived. But if a client of the API wants to highlight (literally or metaphorically) the difference between ketiv and qere, it would be convenient for the client to not have to derive the diff itself.

For instance, it might be cool to "call out" the phantom yod in transliteration by making whatever letter represents it (probably "y") gray-colored or something. Sort of the opposite of literal highlighting (backgrounding rather than foregrounding) but you get what I mean. Or it might be cool to make the Hebrew yod gray, although that's pretty hard to do since it is difficult to control the color of a letter independently from its diacritics.

bdenckla commented 4 months ago

Another thing you may want to consider is whether you want to provide some functionality to help with the (dreaded) superimposed representation of dually-cantillated words. In a way these words are distant cousins of the "implicit ketiv/qere" words I discussed above, if you "buy" the following analogy:

Implicit ketiv/qere words are "badly behaved" but have a well-behaved interpretation.
Words with superimposed cantillation are "badly behaved" but have TWO well-behaved interpretations.

The words with superimposed cantillation include a few that are sort of the opposite of that weird haladonai Deut. 32:6 case I mentioned above: what looks like a single chanted word in the input becomes more than one chanted word in the individual outputs. E.g. what looks like a single chanted word, לֹֽ֣א־יִהְיֶ֥͏ֽה־לְךָ֛֩, becomes the following:

two chanted words לֹֽא־יִהְיֶ֥ה לְךָ֛ in the taḥton cantillation (note the "a-b c" (2+1) division)
two chanted words לֹ֣א יִהְיֶֽה־לְךָ֩ in the elyon cantillation (note the "a b-c" (1+2) division)

charlesLoder commented 4 months ago

Ok returning to this after going down a rabbit hole with the other issue.

There may be no need for my "phantom yod" idea if there is a general feature that allows a pointed qere (.text) to sometimes be present, when it needs to differ from the pointed ketiv (.original). (Or, equivalently, .text is always present, but only sometimes differs from .original.)

The difference between .original and .text is not in terms on ketiv/qere, but rather in how this package handles characters for syllabification.

Example:

const text = new Text("חָפְנִי֙");
console.log(text.text === text.original);
// false, because the `.text` has a qamets qatan character whereas the `.original` does not

For the Divine Name, I don't syllabify it so the .text and .original are the same, but I do have a .isDivineName prop for it.

I do want to hone in on how to handle the implicit ketiv/qeres for the Divine Name, Jerusalem, and hi'.

Perhaps a property that allows a user to pass in a ketiv and set a qere could be helpful.

Example:

const text = new Text("הִ֖וא בֵּֽית־אֵ֑ל ה֖וּא וְכׇל־הָעָ֥ם אֲשֶׁר־עִמּֽוֹ׃", {
    ketivQeres: [
        {
            input: "הִוא",
            output: "הִ֖וא",
            ignoreTaamim: true // idk about this settings
        }
    ]    
});

text.word[0].original // הִ֖וא
text.word[0].text // הִ֖וא

And maybe Jersulem (and its inflected variants) and hi' could be default ones.

I want the package to be flexible enough to handle everything, but I don't want to have to account for everything.

bdenckla commented 4 months ago

And maybe Jersulem (and its inflected variants) and hi' could be default ones.

Yes, it would be nice to have some defaults for the common cases at least. Particularly for these common cases, the k/q is best described in some compact, generalized form (like a regular expression) because there are an unwieldy number of cases to describe explicitly (e.g. over 600 cases of Jerusalem-related words).

charlesLoder commented 4 months ago

Good thought! I can basically recreate something like the ADDITIONAL_FEATURES from the transliteration package.

charlesLoder commented 4 months ago

See new issue, closing this

charlesLoder / havarotjs

Jerusalem: phantom yod #165