PicusZeus / modern_greek_accentuation

Python 3 library for accenting, syllabification, augment handling in modern Greek words.
MIT License
7 stars 0 forks source link

count_syllables contribution #7

Open planetis-m opened 2 weeks ago

planetis-m commented 2 weeks ago

Hi friend,

I noticed your excellent repository and since I happened to write a function for counting syllables I thought to contribute Although you've already one you might consider it as an alternative, since both seem to give same results in my (limited) testing, and that version is a bit more cleaned up. If you are interested in re-implementing yourself I just used the rules given in the teaching aid in https://www.patakis.gr/files/1186890.pdf

However both seem to trip over with the cases bellow, taken from this link:

άδεια: ά-δει-α αδειοδότηση: α-δει-ο-δό-τη-ση αδειοδοτικός: α-δει-ο-δο-τι-κός αδειοδοτώ: α-δει-ο-δο-τώ αδειοδωρόσημο: α-δει-ο-δω-ρό-ση-μο αδειούχος: α-δει-ού-χος

It's not possible to disambiguate between the adjective (f) άδεια and the noun with the same spelling without context. But I guess you already know :)

PicusZeus commented 1 week ago

Thanks for appreciating and for contributing! Well yes, it's impossible in Greek to know with certainty how to treat the sound "I" if a vowel follows, and from my understanding (I'm not a native speaker) it can be sometimes treated in different ways by different speakers. And this also affects moving accent in verbs and nouns (έπιεσα ή πίεσα). And so I decided to construct this function modern_greek_syllabify with a flag an optional flag "true_syllabification" (default on True), so that modern_greek_syllabify('αδειοδότηση', true_syllabification=False) gives the correct answer. I think that for Modern Greek (dimotik) sound "ee" with a following vowel, if accent is not involved, should be treated always as a diphthong, and this rule is broken by katharevousian influence (like η άδεια). Your code surely looks cleaner than mine, I will have some time on weekend and II will have a closer look at it and will try to apply your contribution. By the way, this library was created mainly as a help from my other bigger project, that is modern_greek_inflexion, maybe you would find it interesting also.

planetis-m commented 1 week ago

έπιεσα

I am not into Greek philology, but I can tell that's not a valid form of the verb πιέζω. You can use the rule that says no syllable above the third (προπαραληγουσα) can ever get an accent. So in 'έ-πι-ε-σα' there's an accent in the fourth syllable; that's incorrect. (Some books say "a word can only be accented in the last three syllables", it's the same thing.

I guess you can achieve the 'true_syllabification' in my code by skipping the first two big if-statements. Btw my usecase was to construct a dictionary for autocomplete based on a corpus filled with technical terms. I've written a few functions that performs a basic cleanup of a few obviously wrong forms. It was made as a quick hack but I might have overengineered it.

can be sometimes treated in different ways by different speakers.

I can attest to that, I have no clue why κοροϊδία is κο-ροϊ-δί-α, I would spelled it as κο-ρο-ϊ-δί-α.