NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
8.45k stars 1.32k forks source link

LayoutLMv3 correct OCR engine #352

Closed FrancescoSaverioZuppichini closed 9 months ago

FrancescoSaverioZuppichini commented 9 months ago

Hi Niels,

How are you?

Was looking into LayoutLMv2 notebook that should be the same for LayoutLMv3.

and I noticed this sentence:

As we can see, results aren't as good as previously. This can be explained by the fact that we're using a different OCR engine than the one that was used during fine-tuning.

I was wondering what should be the right OCR engine to use?

Thanks a lot,

Fra

NielsRogge commented 9 months ago

Hi mate! How are you?

The one that Microsoft used is the Azure OCR API, with segments enabled, see more info here: https://github.com/microsoft/unilm/issues/838

FrancescoSaverioZuppichini commented 9 months ago

Hi mate! How are you?

The one that Microsoft used is the Azure OCR API, with segments enabled, see more info here: microsoft/unilm#838

Thanks a lot, well that is a bummer - damn wish this would have been OCR free :)