bbloomf / jgabc

http://bbloomf.github.io/jgabc/
The Unlicense
100 stars 48 forks source link

Incorrect English syllabification #48

Open adunning opened 4 years ago

adunning commented 4 years ago

I have noticed that many English words are not being separated correctly:

  1. Visit https://bbloomf.github.io/jgabc/transcriber.html;
  2. With the language set to English, enter a word such as 'alleluia'.

This will be displayed as having two syllables instead of four at the moment. In other cases words are given too many syllables; e.g. 'ends' gets two.

adunning commented 4 years ago

Related to this is whether final -ed is given its own syllable (de-liv-er-ed v. de-liv-ered). I think this is common enough in adapting chant to make it the default, though perhaps it should be optional.

bbloomf commented 4 years ago

I think there are a lot of problems with English syllabification. Even beyond the examples you give, I think many words are being incorrectly syllabified, but unfortunately, I don't have time to fix it right now.

ftherese commented 3 years ago

Ideally we would just create a database of all the words of the psalter properly syllabified. I wrote a script to automatically parse English syllables using sed, but it too still has a few errors.

adunning commented 3 years ago

How does the hyphenator at https://juiciobrennan.com/hyphenator/ do it? It's not absolutely perfect, but reasonably reliable.

Occasionally I have seen errors in the Latin as well – not sure if the version at http://gregorio-project.github.io/hyphen-la/ would improve it.

bbloomf commented 3 years ago

@adunning The Latin hyphenator you linked to is what gets used when "Liturgical Latin" is selected, so if you see any errors with that, please let them know there, although it looks like there is quite a list of issues and there hasn't been much activity there lately https://github.com/gregorio-project/hyphen-la/issues

The English hyphenator at juiciobrennan.com uses a dictionary, and is what jgabc had been using. I'm not sure why it stopped working from the transcriber tool any more.