EtienneLamoureux / sc-trade-companion

Companion application for SC Trade Tools
https://sc-trade.tools
GNU General Public License v3.0
28 stars 7 forks source link

Improve parsed-price accuracy #26

Open EtienneLamoureux opened 9 months ago

EtienneLamoureux commented 9 months ago

Situation

Prices are prefixed with the ¤ symbol. This symbol is not in the english training set of Tesseract and is read as a random character. When this character is read as a digit, it inflates the prices read by an order of magnitude, i.e. ¤900 becomes 2900.

Tasks

  1. Experiment with heuristics to mitigate the issue
    1. Thousands are always separated by a comma , and groups of digit are only up to 3 long
    2. Only 1 digit is present before the comma , when the price is listed in kilo units K
    3. Others

Results

  1. The ¤ character doesn't inflate prices