experiments with fastspeech2

Both Coqui and ESPnet have been a pain so far, the former more than the latter.

Coqui can generate alignments externally with a Tacotron model, as in FastSpeech's v1, but the default behaviour is to train an alignment head end to end (ref?). Besides, it seems to have moved on with its char utils but not with the script that computes att masks, which I think is importing outdated stuff. The plan was to take a look at what kind of alignments they are producing with Tacotron, so I could reproduce stuff in the same format with MFA later, but RN I couldn't get any of it work.

ESPnet supports MFA but I'm having trouble with MFA server's Postgree's connection (???). RN it seems my best option because it looks like the problem is more from MFA than ESPnet, which should (hopefully) be easier to solve.

falabrasil / ufpalign

experiments with fastspeech2 #10