X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache License 2.0
1.34k stars 84 forks source link

Question about <bbox> in DocDownstream dataset #45

Closed niiickZ closed 5 months ago

niiickZ commented 5 months ago

The meaning of the values in < bbox > are confusing. It doesn't look in the format of x1,y1,x2,y2, since it failes to get correct bbox for most images.

HAWLYQ commented 5 months ago

Hi, @niiickZ The < bbox > just exists in Multi-grained Text Localization tasks in DocStruct4M. There shouldn't be < bbox > in DocDownstream-1.0. Besides, each value x in < bbox > is not the absolute position in the image. The x/999 is the normalized position in the image.

niiickZ commented 5 months ago

Thanks for answering, it works. And indeed it only exists in DocStruct4M, sorry for my typo.