coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.27k stars 276 forks source link

Feature request: Scorer dealing with OOV #1949

Open bernardohenz opened 3 years ago

bernardohenz commented 3 years ago

Hi,

me and my team use STT, for Brazilian Portuguese, and we were having problems when dealing with consecutive OOV (out-of-vocabulary) words. The problem was that, when receiving two or more OOV words, the decoder enters in a state that stop accepting any other word.

After some experimentation, I've taken out the return of OOV_SCORE (in https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/scorer.cpp#L247), but adding a penalization together with the BaseScore as follows:

    // encounter OOV
    // if (word_index == lm::kUNK) {
    //   return OOV_SCORE;
    // }

    cond_prob = language_model_->BaseScore(in_state, word_index, out_state);
    if (word_index == lm::kUNK) {
       cond_prob-=10;
    }

I believe there could be a better solution for this, thus I am opening this issue for discussing a solution.

As your LM is built over a huuge corpus, I suppose that your models do not suffer from OOV words, but I believe that many people may have problems with OOV words with LMs built over smaller corpus.

reuben commented 3 years ago

Thanks for opening! Did you also make parallel changes to the PathTrie to go with this scoring change here? Could you share them as well so we can have the same starting point?

bernardohenz commented 3 years ago

I have experimented with some changes, but as soon I changed the scorer, I undo the changes on PathTrie.

But if I am not mistaken, I just changed to return a path even when not finding on dictionary, as This code is inside get_path_trie

    if (has_dictionary_) {
      matcher_->SetState(dictionary_state_);
      bool found = matcher_->Find(new_char + 1);
      PathTrie* new_path = new PathTrie;
      new_path->character = new_char;
      new_path->timestep = new_timestep;
      new_path->parent = this;
      new_path->dictionary_ = dictionary_;
      new_path->has_dictionary_ = true;
      new_path->matcher_ = matcher_;
      new_path->log_prob_c = cur_log_prob_c;

      // set spell checker state
      // check to see if next state is final
      auto FSTZERO = fst::TropicalWeight::Zero();
      auto final_weight = dictionary_->Final(dictionary_state_);
      if (found)
        final_weight = dictionary_->Final(matcher_->Value().nextstate);
      bool is_final = (final_weight != FSTZERO);
      if ((is_final && reset) || (!found)) {
        // restart spell checker at the start state
        new_path->dictionary_state_ = dictionary_->Start();
      } else {
        // go to next state
        new_path->dictionary_state_ = matcher_->Value().nextstate;
      }
      children_.push_back(std::make_pair(new_char, new_path));
      return new_path;
    } else { .....