Add `taam`, `taamName` and `hasTaamName` to `Syllable`

charlesLoder commented 6 months ago

The api around taamim should be similar to vowels

bdenckla commented 5 months ago

Responding here rather than on Twitter/X. To investigate your questions about "accent multiplicity," I would suggest doing the investigation in the following order, up the different levels:

letter (multiple accents per letter)
syllablle (multiple accents per syllable)
atom (multiple accents per atom)
maqaf compound (multiple accents per maqaf compound)

(Atoms differ only very slightly from maqaf compounds with respect to these matters. One small difference is that some two-part accents can spread across the atoms of a compound, but most are restricted to a single atom.)

Summarizing your responses and the responses of others so far, with regard to letters, here are some cases of multiple accents:

superimposed cantillation in the Decalogues
ole veyored (e.g. אָ֥֫תָּה in Job 8.6)

Adding some of my responses (not claiming that this is exhaustive):

superimposed cantillation in Genesis 35:22
revia mugrash (e.g. כ֝֗וּשׁ in Ps.7.1)
telisha gedolah can appear on the same letter as geresh or gershayim (exactly how many such cases depends on the edition)
some conjunctives can appear on the same letter as a non-stressed prepositive, e.g.:
- munaḥ can appear on the same letter as a deḥi where that deḥi does not indicate stress. E.g. שֶׁ֣֭עֹמְדִים in Ps 135.2.
- merkha can appear on the same letter as a geresh muqdam where that geresh muqdam does not indicate stress (its revia is remote). E.g. תּ֥֝וֹרָתְךָ֗ in Ps 119.61. (I'm not sure if it is called merkha in the poetic system, but my meaning is clear regardless)

If you consider gaʿya (meteg) an accent (debatable):

Gaʿya can appear on the same letter as most (maybe all?) non-stressed prepositives, e.g.:
- deḥi where that deḥi does not indicate stress
- geresh muqdam where that geresh muqdam does not indicate stress (its revia is remote)
- telisha gedolah where that telisha gedolah does not indicate stress (it has a stress helper)
Gaʿya can appear on the same letter as an ole where that ole's yored is remote, e.g. וְהִתְח֢וֹלֵֽ֫ל ל֥וֹ in Ps 37.7.

charlesLoder commented 5 months ago

Thanks! I'll have to ruminate on this.

As for the api, I did a pretty big refactor of the Char object to make these higher level api's a little simpler.

For the Word, Syllable, and Cluster objects, I may refactor the api like this:

Current

.vowel returns the vowel character
.vowelName returns the partial Unicode name of vowel character
.hasVowelName returns a boolean if the vowel name is in the object

New

.vowels returns an array of vowel characters (not really something that would happen, but doing it for consistency below)
.vowelNames returns an array of partial Unicode names of vowel characters
.taam returns the first taam character
.taamName returns the partial Unicode name of the first taam character
.hasTaamName return a boolean is the taam name is in the object
.taamim (add alias of .taams to keep English plural consistency) returns an array of taam characters
.taamNames returns an array of the partial Unicode names of the taam characters

My thought is the the singular apis would be what most people anticipate (one vowel, one taam) and it would be backwards compatible. The plural apis would allow for more precision.

Would that api make sense to you?

I still want to think about how to handle the meteg/gaya. I could give the meteg its own api — .hasMeteg, etc. Ditto for the masora circle. They're not accents, but they operate in a liminal space

charlesLoder commented 5 months ago

I didn't add Masora circle to the taamim:

https://github.com/charlesLoder/havarotjs/blob/306ed6f17ca6c2414b569a7e653adaccda9204de/src/utils/regularExpressions.ts#L111-L118

But I also don't have the meteg character anywhere. So gotta add that!

bdenckla commented 5 months ago

For units larger than the cluster (syllable and word), when you are returning multiple results (e.g. multiple taamim), can these results be easily related back to the cluster they belong to, or are they just a list? I would imagine wanting a sparse array of some sort. E.g. for a three-cluster syllable with taamim a and b on clusters 1 and 3 respectively, I would imagine wanting the answer to the question "what are the taamim on this syllable" to be something like [a, null, b].

bdenckla commented 5 months ago

Is Word what I call an atom?

charlesLoder commented 5 months ago

Is Word what I call an atom?

I think so. Something like "וְכׇל־הָעָם֩" is 2 Words

For units larger than the cluster (syllable and word), when you are returning multiple results (e.g. multiple taamim), can these results be easily related back to the cluster they belong to, or are they just a list? I would imagine wanting a sparse array of some sort. E.g. for a three-cluster syllable with taamim a and b on clusters 1 and 3 respectively, I would imagine wanting the answer to the question "what are the taamim on this syllable" to be something like [a, null, b].

Not exactly, but you could drill down into them.

Example:

const text = new Text("י֥וֹם֩");  // Deut 5:12
const word = text.words[0]; // .words is an array of `Words`
const syllable = word.syllables[0];

syllable.taamim
// ["MERKHA", "TELISHA_QETANA"]

syllable.clusters.map(c => c.taamim);
// [["MERKHA"], [null], ["TELISHA_QETANA"]]

The results would be strings.

Maybe a verbose property or method would be good:

syllable.taamimVerbose
// [ { taam:"MERKHA", cluster: <POINTER> }, { taam:"TELISHA_QETANA", cluster: <POINTER> } ]

bdenckla commented 5 months ago

I see. Indeed, I now see how there's no need for a sparse output at the syllable (or presumably word) level since it can so easily be generated as you suggest:

syllable.clusters.map(c => c.taamim);

I'm not sure I see the need for your suggested taamimVerbose but who knows. Hard to imagine what applications might need or want in advance.

bdenckla commented 5 months ago

.vowels returns an array of vowel characters (not really something that would happen, but doing it for consistency below)

BTW in editions with superimposed cantillation of the Decalogues, there are two words in each of the two Decalogues (for a total of four words) for which there are not only two accents but also two vowels.

Also, the implicit ketiv/qere for yerushalayim and yerushalaymah is usually encoded with two vowels on the lamed. The lamed has its expected "a" vowel (qamats or pataḥ) as well as adopting the orphan ḥiriq or sheva.

charlesLoder commented 5 months ago

BTW in editions with superimposed cantillation of the Decalogues, there are two words in each of the two Decalogues (for a total of four words) for which there are not only two accents but also two vowels.

I'm learning something new everyday!

bdenckla commented 5 months ago

Regarding yerushalayim and yerushalaymah, you have the opportunity to do something I never had the guts to do, which is to introduce the notion of a "phantom yod" to hold those orphan vowel marks (ḥiriq or sheva). You can get rid of about 600 cases of two vowels on a single letter that way. At the cost of introducing this "phantom yod" abstraction of course. But it might be a good trade-off.

Here's a cool feature that would pretty much just "fall out" of this representation for free: the option to show this ketiv/qere explicitly instead of implicitly. See, for example, the treatment of yerushalayim in the recent JPS commentary on Psalms 120-150, e.g.:

The MAM dataset sort of encodes this "phantom yod" idea via its מ:ירושלם template.

benemanuel commented 4 months ago

I made a list on two tammim on a letter. It's more inclusive to the Letteris edition but might be helpful. https://benemanuel.geulah.org.il/two-is-not-one-%D7%91-%D7%98%D7%A2%D7%9E%D7%99%D7%9D-%D7%91%D7%9E%D7%99%D7%9C%D7%94-%D7%90%D7%97%D7%93

bdenckla commented 4 months ago

Thanks @benemanuel for bringing the Ezekiel 20:31 one to my attention. I have it in my list (not published) but overlooked it. MAM has some documentation about it:

אַתֶּם֩ נִטְמְאִ֤֨ים =א,ל,ק ומסורות-א,ל,ק; ראו ייבין כח.1 עמ' 232. זאת התיבה היחידה בכל המקרא שיש בה שני טעמים מחברים בהברה אחת. הקדמא קודמת למהפך בקריאה, כמו בעוד שש מקומות במקרא (שבהם הקדמא במקום הראוי לגעיה והמהפך בהברת הטעם), כגון: ויקרא כה,מו; במדבר כא,א.
מג״ה בדפוס=נִטְמְאִ֤֙ים בפשטא (למרות שבכתב־היד ברור שהוא קדמא)
דפוסים=אַתֶּ֨ם נִטְמְאִ֤ים

charlesLoder commented 4 months ago

Thanks all! This is helpful data, especially for testing

charlesLoder commented 4 months ago

BTW in editions with superimposed cantillation of the Decalogues, there are two words in each of the two Decalogues (for a total of four words) for which there are not only two accents but also two vowels.

@bdenckla Are there digital editions with this? I couldn't notice any word with two vowels

bdenckla commented 4 months ago

BTW in editions with superimposed cantillation of the Decalogues, there are two words in each of the two Decalogues (for a total of four words) for which there are not only two accents but also two vowels.

@bdenckla Are there digital editions with this? I couldn't notice any word with two vowels

MAM Exo 20:2 & Deut 5:6 עַל־פָּנָֽ͏ַ֗י MAM Exo 20:3 & Deut 5:7 מִתָּ֑͏ַ֜חַת

UXLC has 3 of those 4 but its verse numbering is one different:

Exo 20:3 & Deut 5:7 פָּנָֽ͏ַ֗י Exo 20:4 (no corresponding Deut.!) מִתָּ֑͏ַ֜חַת

charlesLoder commented 4 months ago

Thanks! This is some helpful data.

My ideas for this have started to spiral out a bit, but I like it. Here's an example on Clusters:

const clusters = new Text("מִתָּ֑͏ַ֜חַת").clusters;
console.log(
  clusters.map((c) => {
    return {
      text: c.text,
      consonant: c.consonant,
      consonants: c.consonants,
      consonantName: c.consonantName,
      consonantNames: c.consonantNames,
      taam: c.taam,
      taamim: c.taamim,
      taamName: c.taamName,
      taamimNames: c.taamimNames,
      vowel: c.vowel,
      vowels: c.vowels,
      vowelName: c.vowelName,
      vowelNames: c.vowelNames
    };
  })
);

Results:

[
  {
    text: 'מִ',
    consonant: 'מ',
    consonants: [ 'מ' ],
    consonantName: 'MEM',
    consonantNames: [ 'MEM' ],
    taam: null,
    taamim: [ null ],
    taamName: null,
    taamimNames: [ null ],
    vowel: 'ִ',
    vowels: [ 'ִ' ],
    vowelName: 'HIRIQ',
    vowelNames: [ 'HIRIQ' ]
  },
  {
    text: 'תַָּ֑֜͏',
    consonant: 'ת',
    consonants: [ 'ת' ],
    consonantName: 'TAV',
    consonantNames: [ 'TAV' ],
    taam: '֑',
    taamim: [ '֑', '֜' ],
    taamName: 'ETNAHTA',
    taamimNames: [ 'ETNAHTA', 'GERESH' ], // note two taamim
    vowel: 'ָ',
    vowels: [ 'ָ', 'ַ' ],
    vowelName: 'QAMATS',
    vowelNames: [ 'QAMATS', 'PATAH' ] // note two vowels
  },
  {
    text: 'חַ',
    consonant: 'ח',
    consonants: [ 'ח' ],
    consonantName: 'HET',
    consonantNames: [ 'HET' ],
    taam: null,
    taamim: [ null ],
    taamName: null,
    taamimNames: [ null ],
    vowel: 'ַ',
    vowels: [ 'ַ' ],
    vowelName: 'PATAH',
    vowelNames: [ 'PATAH' ]
  },
  {
    text: 'ת',
    consonant: 'ת',
    consonants: [ 'ת' ],
    consonantName: 'TAV',
    consonantNames: [ 'TAV' ],
    taam: null,
    taamim: [ null ],
    taamName: null,
    taamimNames: [ null ],
    vowel: null,
    vowels: [ null ],
    vowelName: null,
    vowelNames: [ null ]
  }
]

bdenckla commented 4 months ago

This is cool to be able to handle these extraordinary words, but I think it would also be fine to just return an error saying that such words are not supported. Or return the result for the first of the two vowels along with a warning.

Remember that these words' very existence is an artifact of a particular (unfriendly) choice of representation used in the great manuscripts. These words do not appear in this unfriendly form in publications intended to be read aloud or chanted. Thus, arguably, this unfriendly form need not be supported by phonetic transcription software. But maybe you're aiming for a general representation here independent of the application of phonetic transcription.

Any thoughts on the representation of the (far more common and more important) dual vowels in yerushalayim and yerushalaymah?

If all you're trying to do is represent the Unicode in a structured form, then I guess yerushalayim and the "QUPO" words can be handled the same. (I call the dual-vowel Decalogue words "QUPO" words because they both consist of qamats, an under-accent, pataḥ, and an over-accent.)

But more deeply, i.e. semantically, the reason for the dual vowel in yerushalayim is very different than the reason for the dual vowel in a QUPO word.

benemanuel commented 4 months ago

Ben, please note that the writers of these original handwritten manuscripts did not think the same. The hypnosis that two tammim have no place on one word and can not be read together, is just that, a hypnosis. There also exists theory's that use ALL tammim ALWAYS.

Avi

On Mon, Apr 15, 2024, 17:48 Ben Denckla @.***> wrote:

This is cool to be able to handle these extraordinary words, but I think it would also be fine to just return an error saying that such words are not supported. Or return the result for the first of the two vowels along with a warning.

Remember that these words' very existence is an artifact of a particular (unfriendly) choice of representation used in the great manuscripts. These words do not appear in this unfriendly form in publications intended to be read aloud or chanted. Thus, arguably, this unfriendly form need not be supported by phonetic transcription software. But maybe you're aiming for a general representation here independent of the application of phonetic transcription.

Any thoughts on the representation of the (far more common and more important) dual vowels in yerushalayim and yerushalaymah?

If all you're trying to do is represent the Unicode in a structured form, then I guess yerushalayim and the "QUPO" words can be handled the same. (I call the dual-vowel Decalogue words "QUPO" words because they both consist of qamats, an under-accent, pataḥ, and an over-accent.)

But more deeply, i.e. semantically, the reason for the dual vowel in yerushalayim is very different than the reason for the dual vowel in a QUPO word.

— Reply to this email directly, view it on GitHub https://github.com/charlesLoder/havarotjs/issues/158#issuecomment-2057045583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTOLNZRO4GM4IMWTQWLEP3Y5PSFPAVCNFSM6AAAAABD7B6XK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGA2DKNJYGM . You are receiving this because you were mentioned.Message ID: @.***>

charlesLoder commented 4 months ago

The api in this issue is really just about being able to query the text and describe it well. In order to reduce user friction, I prefer not to error on things, or at least provide an escape hatch. See this issue as an example — even though the sheva in texts derived from L is clearly meant to be a gaya (like in MAM), I prefer to handle it in some way, even if it's not "correct."

This module is meant to be lower level so no matter how unfriendly the text is, it should work in some way if possible. In the transliteration package, a user can decide how they want to handle output, but in this package, I generally just want to make apis available to be used elsewhere.

Any thoughts on the representation of the (far more common and more important) dual vowels in yerushalayim and yerushalaymah?

I'll make a separate issue for that. I want to mull over it a bit to understand it.

bdenckla commented 4 months ago

Ben, please note that the writers of these original handwritten manuscripts did not think the same. The hypnosis that two tammim have no place on one word and can not be read together, is just that, a hypnosis. There also exists theory's that use ALL tammim ALWAYS.

I think you mean something other than "hypnosis". Maybe you mean something like "fantasy" or "unfounded belief"?

Anyway, we've discussed a lot of different topics in the comments of this issue but the most recent topic was two VOWELS on the same LETTER whereas you are making (IMO somewhat wild) claims about two ACCENTS on the same WORD. These are of course very different topics.

There are many reasons for two accents on the same word, i.e. many different underlying phenomena result in the same superficial (typographic) artifact of two accents on the same word.

I can say with some confidence (and can cite many, many authorities) that there is no tradition in which the two cantillations of the Decalogues represent anything other than an exclusive CHOICE: chant one or chant the other, but not both. Can you cite any authority that suggests that, to the contrary, the two cantillations of the Decalogues represent a possible tradition of simultaneous performance? (By "simultaneous performance" I mean somehow chanting both "at the same time," whatever that would mean.) Are you suggesting the existence of such a tradition?

benemanuel commented 4 months ago

Not traditional but check out Suzanne Haik-Vantoura music of the bible revealed.

On Mon, Apr 15, 2024, 19:09 Ben Denckla @.***> wrote:

Ben, please note that the writers of these original handwritten manuscripts did not think the same. The hypnosis that two tammim have no place on one word and can not be read together, is just that, a hypnosis. There also exists theory's that use ALL tammim ALWAYS.

I think you mean something other than "hypnosis". Maybe you mean something like "fantasy" or "unfounded belief"?

Anyway, we've discussed a lot of different topics in the comments of this issue but the most recent topic was two VOWELS on the same LETTER whereas you are making (IMO somewhat wild) claims about two ACCENTS on the same WORD. These are of course very different topics.

There are many reasons for two ACCENTS on the same word, i.e. many different underlying phenomena result in the same superficial (typographic) artifact of two ACCENTS on the same word.

I can say with some confidence (and can cite many, many authorities) that there is no tradition in which the two cantillations of the Decalogues represent anything other than an exclusive CHOICE: chant one or chant the other, but not both. Can you cite any authority that suggests that, to the contrary, the two cantillations of the Decalogues represent a possible tradition of simultaneous performance? (By "simultaneous performance" I mean somehow chanting both "at the same time," whatever that would mean.) Are you suggesting the existence of such a tradition?

— Reply to this email directly, view it on GitHub https://github.com/charlesLoder/havarotjs/issues/158#issuecomment-2057224556, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTOLN4UFR23WU56ADJKXKDY5P3VLAVCNFSM6AAAAABD7B6XK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGIZDINJVGY . You are receiving this because you were mentioned.Message ID: @.***>

charlesLoder / havarotjs

Add `taam`, `taamName` and `hasTaamName` to `Syllable` #158