allanj / LayoutLMv3-DocVQA

Example codebase for fine-tuning layoutLMv3 on DocVQA
49 stars 3 forks source link

2-D position embeddings #5

Closed StalVars closed 1 year ago

StalVars commented 1 year ago

In the paper for 'layoutlmv3', the following is mentioned: "The LayoutLM and LayoutLMv2 adopt word-level layout positions, where each word has its positions. Instead, we adopt segment-level layout posi- tions that words in a segment share the same 2D position since the words usually express the same semantic meaning". However, I see that the built-in ocr layouts are per word (you are passing separate bbox for each word). Is it correct? Do you think that this will affect ANLS scores using built-in OCR?

allanj commented 1 year ago

Because I can't really obtain the segment information.

But I don't think that difference makes a large impact on that, but better OCR has a much larger improvement.

StalVars commented 1 year ago

Hi @allanj , ok - thanks for open sourcing your code and thanks for the quick response.