biblissima / collatinus

Sources of Collatinus software - Latin lemmatizer, morphological analyzer and scansion
http://outils.biblissima.fr/en/collatinus
GNU General Public License v3.0
62 stars 15 forks source link

Analyse for long vowels / mark ambiguous? #63

Open JimKillock opened 4 years ago

JimKillock commented 4 years ago

Hi there, it would be possible to add a feature to:

  1. Mark long vowels, eg scripsi » scrīpsī (or, if preferred, scrípsí); and optionally
  2. Find and mark text that is ambiguous, eg puella vs puellā; or mala vs māla

The second of these is less important, but it seems the first of these would be quite easy. Also apologies if Collatinus does this and I have misunderstood.

PhVerkerk commented 4 years ago

Dear Jim,

Yes, Collatinus does that, both in its off-line version and on web. You just have to use the "Scansion" tab (off-line) or the "Scan" button (on-line). Attached is the result with my off-line version. I just took your examples and added "patri" where the "a" is common. Here, I put the words in the "line edit slot" but you can put it in the upper part of the window (mandatory, if you have a full text).

When you scan a text, all vowels are marked, both long and short. In the case of scripsi, there is only one solution. For puella, it depends on the case and the two solutions are given (the most probable one first and the second one between parenthesis). The case of mala is worse because it can be a form of four lemmata : mălus, a, um (844) : bad, evil, wicked; ugly; unlucky; māla, ae, f. (33) : cheeks, jaws; mălŭm, i, n. : evil, mischief; disaster, misfortune, calamity, plague; punishment; harm/hurt mālŭm, i, n. : apple; fruit; lemon; quince; hurt; so that you have four solutions. Please note that the different solutions correspond to different lemmata or analyses, which is not the case for patri where the a can be short or long.

When you scan a text, the usual rules for elision and lengthening are applied. Caution, errors are always possible.

Yours,

    Philippe.

Le 15/06/2020 à 09:29, JimKillock a écrit :

Hi there, it would be possible to add a feature to:

  1. Mark long vowels, eg scripsi » scrīpsī (or, if preferred, scrípsí); and optionally
  2. Find and mark text that is ambiguous, eg puella vs puellā; or mala vs māla

The second of these is less important, but it seems the first of these would be quite easy. Also apologies if Collatinus does this and I have misunderstood.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biblissima/collatinus/issues/63, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNY23CB42F24MXDYPSRNEDRWXEWNANCNFSM4N54XMOQ.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/biblissima/collatinus/issues/63", "url": "https://github.com/biblissima/collatinus/issues/63", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

JimKillock commented 4 years ago

Thanks very much Phillippe, I got that to work. It would be helpful (tho not essential) to limit the marking to just long vowels plus ambiguities, would that be worth a feature request?

PhVerkerk commented 4 years ago

As a matter of fact, there is a third kind of vowels (or a fourth, if you count the common ones) : the "non-existing" ones. Besides the trivial case of the "qu" group, there are words as "sanguis" or "suavis" where the "u" is not counted. A good web-site for prosody is http://www.pedecerto.eu/public/lessico/lessico.

My opinion is thus that marking all the vowels (long, short, common and expunctuated) allows one to count the syllables in the word with a smaller margin of error. Removing part of the information is always easy. At least much easier than restoring the lost information. Obviously, several dictionaries do not mark short vowels (nor long ones by position), but to scan a verse, it is better to mark all the known quantities. If you really want to remove the short marks, you just have to replace the 12 characters.

Ph.

Le 17/06/2020 à 12:09, JimKillock a écrit :

Thanks very much Phillippe, I got that to work. It would be helpful (tho not essential) to limit the marking to just long vowels plus ambiguities, would that be worth a feature request?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biblissima/collatinus/issues/63#issuecomment-645283080, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNY23EHMPO4CXWSAEU7SL3RXCI65ANCNFSM4N54XMOQ.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/biblissima/collatinus/issues/63#issuecomment-645283080","url": "https://github.com/biblissima/collatinus/issues/63#issuecomment-645283080", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

JimKillock commented 4 years ago

Yes, for sure, it isn’t difficult. The use case I have is for addition of long vowel markers for learner texts. It introduces an extra step to remove the other markers, so would be nice to avoid this step, but absolutely not essential. I understand the purpose of the software is not this at all, but you may find quite a few people want to use it in this way.