State of the field and force alignment literature

chrisbrickhouse commented 2 years ago

Part of a review at openjournals/joss-reviews#3958

[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?

The manuscript identifies a number of related software packages with which this program could interface, but there is a gap in its references to the force alignment literature. Within linguistics, especially phonetic analysis, force alignment is an important part of the research pipeline whereby an acoustic signal is segmented and aligned with a text transcript. This then allows corpus queries and phonetic analysis of segments. The paper briefly touched on this when discussing Kaldi (Povey, et al. 2011), but the state of the field is broader and this program has important implications for that field. I believe the paper would be improved by further review of that literature.

The most impactful piece of software in that field is the force Alignment and Vowel Extraction (FAVE) toolkit (Rosenfelder, et al. 2014) which converts orthographic transcriptions to phonetic transcriptions through dictionary lookups using the CMU pronunciation dictionary. This has the downside of not being able to handle out-of-dictionary words requiring experimenter transcription or data exclusion.

Other researchers have been trying to improve coverage of force alignment to underdocumented languages and a major problem is the lack of grapheme-to-phoneme mappings (Barth, et al. 2020) or comprehensive pronunciation dictionaries (Johnson, Di Paolo, and Bell 2018). These can be substantial work and the language, orthographic system, or researcher time can limit the utility of these approaches.

These programs require a task similar to the one performed by this package, but do it in a seemingly different way. Comparing this package to the methods used in those packages will improve the paper by connecting it to a wider body of literature and identifying new potential areas of impact.

This is a really interesting project, and I'm excited to look further into the code!

mmmaat commented 2 years ago

Hi Christian, thanks for your comment. Indeed the references you are talking about seems interesting and -- indeed -- related to the phonemizer software. Just... I didn't know them! So I'll review this field and edit the paper in consequence.

mmmaat commented 2 years ago

See changes here.

I added references to FAVE and the Montreal Forced Aligner to the paper, explaining that the use of phonemizer instead of pronunciation dictionary can improve the overall pipeline for compatible languages. Nevertheless I choose to not introduce the issue of under-resourced languages because, even with the phonemizer, you need the language to be supported.

Do you agree with those corrections or do you expect more details?

Thanks.

chrisbrickhouse commented 2 years ago

I still think you should consider covering how phonemizer can be used for under-resourced languages, but I'll get back to that later.

One thing that is worth mentioning is that MFA also includes functionality for training G2P models. This seems different from how the segments backend in this program handles G2P for languages not supported out-of-the-box. As I understand it, MFA's G2P training requires an existing dictionary with sufficient data to train the model accurately. By contrast, the segments backend only seems to require a file that maps graphemes to their pronunciation. I believe the paper would be improved by comparing how this backend differs from the existing MFA function and discussing what use cases would work better for one or the other.

If my understanding is correct, this is where I think Phonemizer shows promise for lesser documented languages. Compiling a dictionary large enough to train a model on is a large task when compared to compiling a grapheme to phoneme map. For example one of my colleagues, Kate Lindsey, worked with the Ende Language Project to develop an orthography and dictionary for the Papuan language, with one of the goals being transparency in grapheme to phoneme mappings (Lindsey 2021). In cases like these, compiling a grapheme-to-phoneme map for the segments backend is easier and more reliable than compiling a dictionary and training a model using MFA.

If I misunderstood how that backend works or you still think this is beyond the scope of your need statement, I won't push you to include it. To me it seems like a problem that this program can help solve, though I understand that automated transcription of Papuan languages is a rather niche use case. I'm unfamiliar with the NLP side of the literature review, but from my work with field linguists, I can see this being a very important tool for them which is why I'd encourage engaging with the literature on lesser documented languages.

mmmaat commented 2 years ago

OK I see your point about under-documented languages now. Indeed the compilation of G2P maps to be used with the segments backend is a promising application of that tool. I'm not competent in linguistics (more in TTS or speech modeling) so I didn't get your point at the first time. So... at the end of the Statement of Need section I added few words on that point, along with 2 of your references, and one from my lab on the phonetic analysis of a Tsimane corpus (Cristia, 2020). Still... I don't want to go deep into details because the paper is not about methodology, or comparing the accuracy of one method over another, this is just about introducing the software and its functionalities. Actually I don't know which approach is the better for which use case, I'm just offering another tool to the community.

About the G2P training in MFA, it still requires a non-exhaustive lexicon as input and is used to bootstrap the lexicon with missing words. There are also deep learning models (that one for instance) doing G2P. This is a bit different from the phonemizer. Actually trained models could be integrated as new backends.

mmmaat commented 2 years ago

Changes here.

chrisbrickhouse commented 2 years ago

Sorry for the delay! I missed the notification during holiday travel, but LGTM. There's no need to go deep into details, and even just making clear that it's another potential tool is helpful. The wonderful thing about peer review is that we all get to learn from each other. The changes you made look great, and thanks for the conversation and your patience!

mmmaat commented 2 years ago

Don't worry for the delay, and thanks again for reviewing this paper. You put me in a direction I didn't expect and paved the way to new applications for that tool, that was interesting!

bootphon / phonemizer

State of the field and force alignment literature #91