AdolfVonKleist / Phonetisaurus

Phonetisaurus G2P
BSD 3-Clause "New" or "Revised" License
449 stars 122 forks source link

forcing exact dictionary lookup #51

Closed dpny518 closed 4 years ago

dpny518 commented 4 years ago

I have trained with the dictionary, but since it learns to generalize it gets some words that are in the training data wrong for example "the", "to", letters of the alphabet. Is there a way to force if the word is from the training to match it completely? and only generalize for new unseen wodds

AdolfVonKleist commented 4 years ago

Yes it makes errors like these. You will get a better general result [and quicker] if you use it in combination with your reference lexicon.

There is a small example g2p servlet that shows a simple integration for this:

there are also some better alternatives to this out there these days. I have a transformer-based solution that will be released here before end of year. I'll continue to fix bugs here, but 'the future is coming' and this is getting pretty dated.

dpny518 commented 4 years ago

This is great for performance, accuracy seems good enough, and the question is how much more ram, cpu, gpu you need to just increase accuracy a little bit by using neural method

jpetso commented 4 years ago

@AdolfVonKleist: For the transformer-based solution you mentioned, did you mean end of year 2019 or 2020? Or academic year? Where would I subscribe to make sure I don't miss this upcoming release?