falabrasil / ufpalign

👄🇧🇷 Alinhamento fonético forçado em Português Brasileiro
MIT License
7 stars 2 forks source link

handle long utts but not depending solely on increasing beam #14

Open cassiotbatista opened 1 year ago

cassiotbatista commented 1 year ago

MFA seems to recommend increase the beam value for long files (some issues from 2019 but some are more recent). EasyAlign used to apply a macro segmentation pre-stage before actually aligning (c.f. sect 2.1).

Might be hard on normalized text, tho -- no punctuation, capitalization, linebreaks or annotations of any sort.

cassiotbatista commented 1 year ago

Another option to keep an eye on: https://github.com/srinivr/kaldi-long-audio-alignment