CUNY-CL / latin_scansion

Apache License 2.0
0 stars 2 forks source link

first draft of meter.grm #14

Closed jillianchang closed 3 years ago

jillianchang commented 3 years ago

I'm wondering how I would define syllable boundaries when writing the rules for longus and brevis. For example, how would the grammar know whether it's [wi] or [wir] in virumque?

kylebgorman commented 3 years ago

This is a big question and I'm sorry for not thinking carefully about it earlier.

So like a lot of languages, Latin generally prefers onsets over codas. So if there's a single intervocalic consonant it's always an onset and never a coda. What happens with mid-word consonant clusters is a bit more complicated: most clusters of this form are one coda consonant followed by one or more onset consonants, but the poet is allowed to decide how to parse the sequence called muta cum liquida: stops [p, b, t, d, k, g] plus liquids [r, l]: they can be complex onsets or coda-onset sequences, depending.

The same way that we tag longi and breves, I would suggest also tagging the elements of syllables as onsets, nuclei and codas. More specifically, what I would do is map onset consonants onto O, short monophthongs onto U (looks like the breve marker), long monophthongs onto - (looks like the long marker), and coda consonants onto C.

If you first tag all onsets, then the remaining consonants and vowels are easy. Latin tends to "maximize onsets" at the expense of codas so that makes it not as hard as you might think.

I'll send you a draft of how I did this in my pilot off-thread. I'm not sure I got all the details right but it should help you get started...

kylebgorman commented 3 years ago

Looks fine to me. I guess you have to debug the pieces. Watch out for spaces!

On Wed, Jul 21, 2021 at 8:40 PM jillianchang @.***> wrote:

@.**** commented on this pull request.

In grammars/meter.grm https://github.com/CUNY-CL/LatinScansion/pull/14#discussion_r674432118:

test_weight_3 = AssertEqual[

  • "ajraːtoːs jamkwe ekskiːsaː trabe firma kawaːwit" @ WEIGHT_PARSE,

  • "UCO–O–C OUCOU UCO–O– OUOU OUCOU OUO–OUC"

-];

-test_weight_4 = AssertEqual[

  • "juːdikiũː paridis spreːtajkwe injuːria formaj" @ WEIGHT_PARSE,

  • "O–OUOU– OUOUOUC O–OUCOU UCO–OUU OUCOUC"

-];

-test_weight_5 = AssertEqual[

  • "faːs awt ille sinit superiː reːŋnaːtor olumpiː" @ WEIGHT_PARSE,

  • "O–C UC UCOU OUOUC OUOUO– O–CO–OUC UOUCO–"

  • "O–C OUC –C UC" @ WEIGHT_PARSE,

  • "L L L L"

+]; # Heavy by position.

+

+# Complete foot mapping for "Mūsa, mihī causās memorā, quō nūmine laesō."

Is export SCAN = Optimize[SYLLABLE_PARSE @ WEIGHT_PARSE @ FOOT_TYPE];a valid statement? I get a nullptr when testing it on pronunciation output, though.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#discussion_r674432118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OJFXDBODJZE7PJXW53TY5SJFANCNFSM5AKR5KWQ .

jillianchang commented 3 years ago

That's weird– the tests pass when I test the rules sequentially, but not when I compose them together.

kylebgorman commented 3 years ago

What about when you combine two pieces at once? Anything sensible there?

K

On Wed, Jul 21, 2021 at 9:04 PM jillianchang @.***> wrote:

That's weird– the tests pass when I test the rules sequentially, but not when I compose them together.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#issuecomment-884593527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OOWRASPH7XOBSLAARDTY5VDBANCNFSM5AKR5KWQ .

jillianchang commented 3 years ago

Tried again composing the three together at once, and magically it worked this time :)

kylebgorman commented 3 years ago

Ready for final review?

On Wed, Jul 21, 2021 at 11:12 PM jillianchang @.***> wrote:

Tried again composing the three together at once, and magically it worked this time :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#issuecomment-884628478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OLA6KXO346Z2JVUW23TY6ECLANCNFSM5AKR5KWQ .

jillianchang commented 3 years ago

Yep! I’ll commit my changes now.

On Jul 21, 2021, at 11:16 PM, Kyle Gorman @.***> wrote:

Ready for final review?

On Wed, Jul 21, 2021 at 11:12 PM jillianchang @.***> wrote:

Tried again composing the three together at once, and magically it worked this time :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#issuecomment-884628478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OLA6KXO346Z2JVUW23TY6ECLANCNFSM5AKR5KWQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#issuecomment-884629527, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANYD7F5DLM2WFEUYRJLPSJTTY6EPRANCNFSM5AKR5KWQ.

kylebgorman commented 3 years ago

Oh I see why now. hexameter should be an acceptor but the pieces you built it from are a transducer. It's supposed to be a "filter" (read: acceptor) that only accepts valid sequences of 6 feet that make a dactylic hexameter. You need this because otherwise SCAN can generate 5- or 7-foot "verses", or trochees in non-final feet, or so on...

It should accept strings like "SDSDST"

On Thu, Jul 22, 2021 at 10:39 AM jillianchang @.***> wrote:

@.**** commented on this pull request.

In grammars/meter.grm https://github.com/CUNY-CL/LatinScansion/pull/14#discussion_r674863511:

+# Represents the diphthongs ae, oe, au, eu, ou, ui, and ei.

+diphthong = "aj" | "oj" | "aw" | "ew" | "ow" | "uj" | "ej" | "ẽːj";

+

+# Long vs. short syllables.

+longus = (consonant* short_monophthong consonant) |

  • (consonant long_monophthong consonant) |

  • (consonant diphthong consonant)

  • : "L"; #FIXME

+brevis = (consonant* short_monophthong) : "B";

+

+# Foot types.

+dactyl = longus brevis brevis;

+spondee = longus longus;

+trochee = longus brevis;

+

+hexameter = (dactyl | spondee){5} (spondee | trochee);

Including hexameter in SCAN gives nullptr (?)

test_scan_1 = AssertEqual[

"muːsa mihiː kawsaːs memoraː kwoː nuːmine lajsoː" @ SCAN,

""

];

(not sure what the output is supposed to be, but it's currently null anyway)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#discussion_r674863511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4ONUQEY4JXTUIDYY2TDTZAUQ3ANCNFSM5AKR5KWQ .

kylebgorman commented 3 years ago

It's possible you can fix hexameter using Thrax's output-projection function, which I think looks like:

Project[..., 'output']

where ... is a placeholder for the FST you want to output-project.

jillianchang commented 3 years ago

Oh I see why now. hexameter should be an acceptor but the pieces you built it from are a transducer. It's supposed to be a "filter" (read: acceptor) that only accepts valid sequences of 6 feet that make a dactylic hexameter. You need this because otherwise SCAN can generate 5- or 7-foot "verses", or trochees in non-final feet, or so on... It should accept strings like "SDSDST"

So essentially the pieces I defined hexameter with are transducers, which makes hexameter invalid?

kylebgorman commented 3 years ago

Yes, that's right. Projection might help though.

On Thu, Jul 22, 2021 at 11:28 PM jillianchang @.***> wrote:

Oh I see why now. hexameter should be an acceptor but the pieces you built it from are a transducer. It's supposed to be a "filter" (read: acceptor) that only accepts valid sequences of 6 feet that make a dactylic hexameter. You need this because otherwise SCAN can generate 5- or 7-foot "verses", or trochees in non-final feet, or so on... It should accept strings like "SDSDST" … <#m3970676918731276636>

So essentially the pieces I defined hexameter with are transducers, which makes hexameter invalid?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/LatinScansion/pull/14#issuecomment-885374007, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OIEXDOUQJEZG6MXQDDTZDOVLANCNFSM5AKR5KWQ .

jillianchang commented 3 years ago

Sorry, I'm confused about the project-output function. Would you mind explaining how it relates to hexameter?

kylebgorman commented 3 years ago

Sorry, I'm confused about the project-output function. Would you mind explaining how it relates to hexameter?

Quickly review FSTP section 3.8 for the definition of projection, if you haven't already.

hexameter, as you defined it, is a relation from "weights" (L and H) to feet (S, D, and T). We just want a filter on valid sequences of hexameter feet though. So if you do something like hexameter = Project[your definition here, 'output']; I think you'll get an acceptor that accepts sequences like SDSDST. Then if you compose FOOT_TYPE @ hexameter you'll restrict foot parsing to sequences that add up to an actual hexameter verse (you won't be able to have 5- or 7-feet lines anymore). I'm sorry, I don't know how to explain it any better than that: load the transducers into Pynini and look at them in Jupyter notebook if it's not clear.