performance improvement and refactoring

udnp commented 6 years ago

this PR improves performance at the points of followings:

PokemonNameCorrector caches at first normalized text needed to improve detecting pokemons.
PokemonNameCorrector stops guessing when it finds the best match pokemon. https://github.com/udnp/GoIV_JP/blob/cbcb57dd3fbfb3e6f80dbc6d31a60786e0d1c18a/app/src/main/java/com/kamron/pogoiv/scanlogic/PokemonNameCorrector.java#L418
```
        if (dist < lowestDist) {
            bestMatchPokemon = trypoke.getValue();
            lowestDist = dist;
            if (dist == 0) break;
        }
```

refactoring points:

normalizing for OCR scanned text are gathered into ScanData class.
normalizing for pokemon info text are gathered into PokemonNameCorrector class.

udnp commented 6 years ago

Here performance improvement means decrease duration of detecting pokemons.

My previous PR #968 has the issue that increases duration of detecting pokemons. So this PR is to improve it.

This is a demo comparing this PR's improvement logic with v5.1.0(prev PR #968), and this PR logic decreases about 35% of duration for detecting pokemons.

quick_iv-pr974-vs-v510

This demo is the case of GoIV_JP(my private GoIV build for Japanese), but improvement logic is the same as GoIV origin.

udnp commented 6 years ago

Especially, this performance issue had been caused when calling PokemonNameCorrector#getNicknameGuess(), in particular, for the cases #2, #4, #5 and #7 in PokemonNameCorrector#getPossiblePokemon().

https://github.com/farkam135/GoIV/blob/081bfbe32630e7406c70a953ea7fdefc897d81ed/app/src/main/java/com/kamron/pogoiv/scanlogic/PokemonNameCorrector.java#L44

        //1. Check if nickname perfectly matches a pokemon (which means pokemon is probably not renamed)

        //2. See if we can get a perfect match with candy name & upgrade cost

        //3.  check correction for Eevee’s Evolution using it's Pokemon Type

        //3.1 Azuril and marill have the same evolution cost, but different types.

        //4. maybe the candy upgrade cost was scanned wrong because the candy icon was interpreted as a number (for
        // example the black candy is not cleaned by the ocr). Try checking if any in the possible evolutions that
        // match without the first character.

        //5.  get the pokemon with the closest name within the evolution line guessed from the candy (or candy and
        // cost calculation).

        //6. Check if the found pokemon should be alolan variant or not.

        //7. All else failed: make a wild guess based only on closest name match

udnp commented 6 years ago

In PR #968 java.text.Normalizer.normalize() and other operations for charactors had been called in the for-loop of PokemonNameCorrector#getNicknameGuess() through StringUtils.normalize(). This had caused increase of detecting pokemon duration. Especially, for-loop for the candy pokemons collection is much affected because this collection size is so long.

https://github.com/farkam135/GoIV/blob/081bfbe32630e7406c70a953ea7fdefc897d81ed/app/src/main/java/com/kamron/pogoiv/scanlogic/PokemonNameCorrector.java#L339-L351

udnp commented 6 years ago

Additionally, I've found that PokemonNameCorrector#getNicknameGuess() continued the for-loop even if the best match pokemon was found. Because Data.levenshteinDistance() returns 0 for the best match pokemon, it should stop and break the for-loop if it finds the best match pokemon like ★ commented line.

         if (dist < lowestDist) { 
             bestMatchPokemon = trypoke; 
             lowestDist = dist; 
             if (dist == 0) break; // ★
         }

udnp commented 6 years ago

About refactoring.

In PokeInfoCalculator, the methods to return either normalized text or not normalized text had been mixed. So as its role, this PR makes PokeInfoCalculator provide raw(not normalized) poekmon text, and pokemon text is normalized by the side of calling PokeInfoCalculator, like PokemonNameCorrector and InputFraction.

And this PR makes PokemonNameCorrector singleton.

udnp commented 6 years ago

Lastly, this PR makes SanData manage scanned text and provide normalized scanned text as its role. As a result, OcrHelper class gets simple a little. 😅

farkam135 / GoIV

performance improvement and refactoring #974