JoFrhwld / FAVE

A repository for maintaing the fave-align and fave-extract toolkits
GNU General Public License v3.0
115 stars 37 forks source link

DISC compatibility for FAVE-extract #11

Open scjs opened 10 years ago

scjs commented 10 years ago

Update FAVE-extract for compatibility with DISC (CELEX) phonetic alphabets for use with non-North American dialects

scjs commented 10 years ago

Are there Plotnik codes for all of the DISC codes? FAVE has a few explicit references to ARPAbet, but one way to make it more compatible (and add other transcription systems down the line) might be to convert all of these to the Plotnik codes internally. DISC and ARPAbet both have codes that the other one doesn't unfortunately. @kylebgorman

JoFrhwld commented 10 years ago

I think this could fall under a larger goal of mine to make FAVE operate over any arbitrary label set. The way I see this working is:

  1. Moving translation into plotnik codes out of FAVE-extract, and into a pre-processing procedure on the output of FAVE-align.
  2. Replace plotnik contextual coding with simply including segmental and lexical information in the FAVE output.
  3. Providing FAVE-extract with a list of vowel phones and measurement point heuristics.
  4. Providing FAVE with means and covariances for the specialized phone set.

Parts (1) and (2) should be easy enough, part (3) is probably going to be really difficult, and part (4) will require some investment on the part of the researcher at the start of their project, but shouldn't be so bad. The effect of the "priors" on the overall formant estimation process is relatively weak, so only about 5 or 10 measurements per vowel should really be necessary.

chrisbrickhouse commented 2 years ago

@JoFrhwld was this resolved in f5b7e709b0bc7ac680b66626afc2bdf77bc90945? If so this can be closed.

JoFrhwld commented 2 years ago

No, I think these were commits for windows compatibility. The heavy lifting for allowing DISC codes is spread across a few functions, but key among them is isVowel() which still references the global VOWELS variable.

JoFrhwld commented 2 years ago

Also, a ton of plotnik.py is implicated.