OpenPecha / OCR-Pipelines

1 stars 0 forks source link

[Bug] imagegroup path inconsistency #26

Open 10zinten opened 1 year ago

10zinten commented 1 year ago

In case of W14322, image downloader saves image with imagegroup with prefix I and GoogleVisionformatter is looking for image with imagegroup without theI` prefix

for eg:

eroux commented 1 year ago

Ah ok yes. It's an artefact in BDRC's database, see https://github.com/OpenPecha/Toolkit/blob/master/openpecha/buda/api.py#L189

Do we really have to reinvent the wheel every time? It looks like every time something starts working we throw it away and start a new code base that just reproduces each and every bug that we fixed in the first code...

10zinten commented 1 year ago

But GoogleVisionBDRCFileProvider is not using this to convert imagegroup to folder name. Therefore, formatter can't find the image hence, we getting empty text file.

10zinten commented 1 year ago

so even if we run google ocr manually, this issue will still persist because we have to use latest Google Vision formatter but it can't ind the ocr outputs because of imagegroup to folder name conversion.

eroux commented 1 year ago

do I really need to fix it myself or can you do it? If I need to do it I'll rewrite a lot of the code, but that's fine, BDRC really needs a way to run OCR

kaldan007 commented 1 year ago

@eroux I will look into it asap. Its on me as I was there during changes. @10zinten isn't familiar with this part of the code.