Closed ngawangtrinley closed 1 year ago
response from Google: in short it seems fixed in the latest version of the model
@10zinten could you run GV on just this image with
"language_hints": ["bo-t-i0-handwrit"]
and
"model": "builtin/weekly"
and post the resulting json here so that we can check if the issue is fixed on Google's end?
@eroux here the result
thanks! Well, the duplication disappeared indeed... good news for the future OCRs!
Describe the bug Google OCR duplicates some symbols in pages. We OCRed using 3 model configs as recommended by Forest and found that the
handwrit
language_hint (Config 2 and 3 below) duplicates symbols at random places.Here's the config together with the json output for this image.
Json output: model_config_1_I1PD958780125.json
2 last lines without duplication:
Json output: model_config_2_I1PD958780125.json
2 last lines with duplication of
སྤྲོས་པ་མེད་པ།\n
:Duplication in the symbols:
Json output: model_config_3_I1PD958780125.json
2 last lines with duplication of
སྤྲོས་པ་མེད་པ།
:To Reproduce Steps to reproduce the behavior:
Screenshots Coming soon...
Additional context Add any other context about the problem here.