CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

Question Files – General Question #455

Closed ariter777 closed 5 years ago

ariter777 commented 5 years ago

Well, I understand the concept of question files, but there is one thing that is very important and I haven't found an answer to: can Merlin understand any category in the question files?

What I mean is, for instance in the default questions-radio_dnn_416.hed, there are questions for fricatives and stops, but not specifically for approximants, etc. If I add those, will Merlin actually use them, or is there a limited set of category names that it understands, and the rest is just thrown away?

simonkingedinburgh commented 5 years ago

On 24 Apr 2019, at 11:01, ariter777 wrote:

What I mean is, for instance in the default questions-radio_dnn_416.hed, there are questions for fricatives and stops, but not specifically for approximants, etc. If I add those, will Merlin actually use them, or is there a limited set of category names that it understands?

There are no built-in phone label sets or phonetic categories or in Merlin. These questions simply use string matching (just like HTK or HTS).

So, yes, you could add a question that groups together approximants into a category.

Simon The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ariter777 commented 5 years ago

On 24 Apr 2019, at 11:01, ariter777 wrote: What I mean is, for instance in the default questions-radio_dnn_416.hed, there are questions for fricatives and stops, but not specifically for approximants, etc. If I add those, will Merlin actually use them, or is there a limited set of category names that it understands? There are no built-in phone label sets or phonetic categories or in Merlin. These questions simply use string matching (just like HTK or HTS). So, yes, you could add a question that groups together approximants into a category. Simon The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Thanks for the answer. One more thing – is a more elaborate questions file (I'm talking beyond the basic 416) going to cause any noticeable improvement in model (and synthesis) quality? Has that been tested?

simonkingedinburgh commented 5 years ago

On 24 Apr 2019, at 11:29, ariter777 wrote:

Thanks for the answer. One more thing – is a more elaborate questions file (I'm talking beyond the basic 416) going to cause any noticeable improvement in model (and synthesis) quality?

If those questions are querying the existing linguistic context, then no, probably not. It’s not something I would invest research time in.

Instead, you should look for new context that is not captured already. For example, linguistic information outside the current sentence, or information that is not represented in the text at all.

Has that been tested?

In older work done with HMMs, we found you could throw away most features (i.e., questions) http://www.cstr.ed.ac.uk/downloads/publications/2012/HengLuSimonKing.pdf with little detriment. I think for DNNs this is probably even more true. Newer sequence-to-sequence models typically use only the current phone plus a small number of other features such as word boundaries. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ariter777 commented 5 years ago

On 24 Apr 2019, at 11:29, ariter777 wrote: Thanks for the answer. One more thing – is a more elaborate questions file (I'm talking beyond the basic 416) going to cause any noticeable improvement in model (and synthesis) quality? If those questions are querying the existing linguistic context, then no, probably not. It’s not something I would invest research time in. Instead, you should look for new context that is not captured already. For example, linguistic information outside the current sentence, or information that is not represented in the text at all. Has that been tested? In older work done with HMMs, we found you could throw away most features (i.e., questions) http://www.cstr.ed.ac.uk/downloads/publications/2012/HengLuSimonKing.pdf with little detriment. I think for DNNs this is probably even more true. Newer sequence-to-sequence models typically use only the current phone plus a small number of other features such as word boundaries. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Thanks a lot!