allanj / LayoutLMv3-DocVQA

Example codebase for fine-tuning layoutLMv3 on DocVQA
49 stars 3 forks source link

Accuracy == Bad :( #3

Open logan-markewich opened 1 year ago

logan-markewich commented 1 year ago

I'm struggling as well to get good accuracy out of LayoutLMV3. Compared to V2, V3 seems much worse actually.

Did you ever get any better results?

allanj commented 1 year ago

not really though. The OCR seems really important

logan-markewich commented 1 year ago

💔

logan-markewich commented 1 year ago

Yea I just gave up with V3. Been experimenting with LiLT and that works pretty well, but I wish they would publish a large version lol

StalVars commented 1 year ago

Hello @logan-markewich ,LiLT is not trained for DocVQA, no?

logan-markewich commented 1 year ago

@StalVars just gotta fine-tune it yourself :) It works pretty well, but I wish there was a large version

StalVars commented 1 year ago

@logan-markewich , Thanks for the quick reply. May I ask how good is the anls score on dev/test with LiLT?

logan-markewich commented 1 year ago

@StalVars i've been working with a custom dataset (DocVQA + a bunch of my own annotated data)

If I had to approximate it, I'd say LiLT is comparable to LayoutLMV2-base (maybe just a tiny bit worse). But, LiLT has a less restrictive license lol

StalVars commented 1 year ago

@logan-markewich , ok, thanks again for the quick response :)

minhoooo1 commented 1 year ago

@logan-markewich Could you please share the fine-tune code on the DocVQA dataset? Thanks a lot!