HlidacStatu / OcrMinion

Client pro získávání textů z obrázkových dokumentů
MIT License
46 stars 4 forks source link

Error in findFileFormatStream: truncated file #9

Closed pstast closed 5 years ago

pstast commented 5 years ago

Aktuálně mi selhávají všechny tasky na různých strojích:

2019-11-17T19:46:53.399238084Z info: HlidacStatu.OcrMinion.Program[0]
2019-11-17T19:46:53.399296169Z       Image for task[6d410e09-14a3-4649-a364-d50cd867db05] successfully downloaded.
2019-11-17T19:46:53.399306848Z info: HlidacStatu.OcrMinion.Program[0]
2019-11-17T19:46:53.399314307Z       Starting OCR process of #50. task.
2019-11-17T19:46:53.400219834Z info: HlidacStatu.OcrMinion.Program[0]
2019-11-17T19:46:53.400244728Z       Getting new image.
2019-11-17T19:46:53.408875565Z warn: HlidacStatu.OcrMinion.Program[0]
2019-11-17T19:46:53.408924031Z       Returned task is invalid. 
2019-11-17T19:46:53.408937044Z       null
2019-11-17T19:46:53.766286362Z warn: HlidacStatu.OcrMinion.Program[0]
2019-11-17T19:46:53.766306332Z       OCR process of #50. task unsuccessfully finished.
2019-11-17T19:46:53.766641001Z warn: HlidacStatu.OcrMinion.Program[0]
2019-11-17T19:46:53.766665715Z       Tesseract Open Source OCR Engine v4.0.0 with Leptonica
2019-11-17T19:46:53.766668337Z       Error in findFileFormatStream: truncated file
2019-11-17T19:46:53.766670599Z       Error during processing.
suchoss commented 5 years ago

@michalblaha můžeš se prosím podívat, jak vypadá obrázek pro task [6d410e09-14a3-4649-a364-d50cd867db05]? Vypadá to, že si s ním tesseract neví rady...

pstast commented 5 years ago

Dneska opět celý den jen toto:

2019-11-24T17:27:31.650657684Z info: HlidacStatu.OcrMinion.Program[0]
2019-11-24T17:27:31.650732628Z       Image for task[34b94891-2cec-4b7c-8a61-a80ac12437b0] successfully downloaded.
2019-11-24T17:27:31.650757757Z info: HlidacStatu.OcrMinion.Program[0]
2019-11-24T17:27:31.650777258Z       Starting OCR process of #30. task.
2019-11-24T17:27:32.045898033Z warn: HlidacStatu.OcrMinion.Program[0]
2019-11-24T17:27:32.045950315Z       OCR process of #30. task unsuccessfully finished.
2019-11-24T17:27:32.045968041Z warn: HlidacStatu.OcrMinion.Program[0]
2019-11-24T17:27:32.045971151Z       Tesseract Open Source OCR Engine v4.0.0 with Leptonica
2019-11-24T17:27:32.045973824Z       Error in findFileFormatStream: truncated file
2019-11-24T17:27:32.045976297Z       Error during processing.
michalblaha commented 5 years ago

Stale stejne cislo tasku?

Je to vicemene prazdny obrazek, bez textu...

kissi7 commented 5 years ago

Nahodil jsem dalsi 4 instance a koukam, ze mam stejny problem i na vsech ostatnich (starsich). Cislo tasku je jine.

info: HlidacStatu.OcrMinion.Program[0]
      Image for task[e0d81e6b-bceb-4c24-823f-c2674ff2dfad] successfully downloaded.
info: HlidacStatu.OcrMinion.Program[0]
      Starting OCR process of #1. task.
info: HlidacStatu.OcrMinion.Program[0]
      Getting new image.
warn: HlidacStatu.OcrMinion.Program[0]
      Returned task is invalid.
      null
warn: HlidacStatu.OcrMinion.Program[0]
      OCR process of #1. task unsuccessfully finished.
warn: HlidacStatu.OcrMinion.Program[0]
      Tesseract Open Source OCR Engine v4.0.0 with Leptonica
      Error in findFileFormatStream: truncated file
      Error during processing.
info: HlidacStatu.OcrMinion.Program[0]
      Image for task[caceb360-ed76-43e5-9c4d-e3c34d79525b] successfully downloaded.
info: HlidacStatu.OcrMinion.Program[0]
      Starting OCR process of #2. task.
info: HlidacStatu.OcrMinion.Program[0]
      Getting new image.
warn: HlidacStatu.OcrMinion.Program[0]
      Returned task is invalid.
      null
warn: HlidacStatu.OcrMinion.Program[0]
      OCR process of #2. task unsuccessfully finished.
warn: HlidacStatu.OcrMinion.Program[0]
      Tesseract Open Source OCR Engine v4.0.0 with Leptonica
      Error in findFileFormatStream: truncated file
      Error during processing.
info: HlidacStatu.OcrMinion.Program[0]
      Image for task[26b3f899-89ad-48a9-bc9c-426383964d49] successfully downloaded.
info: HlidacStatu.OcrMinion.Program[0]
      Starting OCR process of #3. task.
info: HlidacStatu.OcrMinion.Program[0]
      Getting new image.
warn: HlidacStatu.OcrMinion.Program[0]
      Returned task is invalid.
      null
warn: HlidacStatu.OcrMinion.Program[0]
      OCR process of #3. task unsuccessfully finished.
warn: HlidacStatu.OcrMinion.Program[0]
      Tesseract Open Source OCR Engine v4.0.0 with Leptonica
      Error in findFileFormatStream: truncated file
      Error during processing.
michalblaha commented 5 years ago

Chyba byla u nas, na serveru. Opravena kolem 24.11. 19:40. Vsechny relevantni tasky, ktere mely problem, byly restartovany.

michalblaha commented 5 years ago

Ma to tyto duvody 1) statistika na webu je cachovana 2 minuty, takze je vetsinou pomerne zpozdena. Duvodem je snizeni zateze na tuto statistiku na SQL serveru 2) lehce jsme omezili pridelovani ukolu, a to tak, ze jsou pridelovany nejdrive po 3 sekundach, nekdy drive. 3) porad plati, ze OCR miniony jsou rychlejsi, nez pridelovani ukolu. Je to dane charakterem dokumentu, ktere se ted prave zpracovavaji (vetsina jde vydolovat mimo OCR Minion naprimo) a jsou pomerne kratke.