ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 208 forks source link

normalize() function in TurkishTextNormalizer #266

Open Baris000-eng opened 3 years ago

Baris000-eng commented 3 years ago

Hi, I do not understand why we take the first three letters of the string if it has no formal analysis and if its' length is greater than 3. If you elaborate on that, I will be thankful, Best, Barış

Baris000-eng commented 3 years ago

if ((analyses.analysisCount() == 0) && current.length() > 3) {

    List<String> spellCandidates = spellChecker
        .suggestForWord(current, previous, next, lm);
    if (spellCandidates.size() > 3) {
      spellCandidates = new ArrayList<>(spellCandidates.subList(0, 3));
    }
    candidates.addAll(spellCandidates);
  }

This part of code is the part that I am asking

ahmetaa commented 3 years ago

Well I think it was because spell checker generates too much candidates for short words and they are not reliable. But feel free to change this and try if it works for you. Unfortunately text normalization code was not really production ready