allenai / vila

Incorporating VIsual LAyout Structures for Scientific Text Classification
Apache License 2.0
173 stars 16 forks source link

Fix incorrect bbox scaling #24

Closed lolipopshock closed 2 years ago

lolipopshock commented 2 years ago

In the previous version, there are two errors in the box normalization step:

  1. we'll scale the bbox for both the height and width dimension even if only one dimension is oversized
  2. we perform the scaling after inserting the special tokens, which might have the `[1000,1000,1000,1000] box

As such, let's say we have a page of size (700, 1024) (width, height). In the previous normalization, it will:

  1. scale the width dimension to 1000
  2. since it normalizes after injecting a special token of coordinate 1000, it will resize it into 1000*1000/700=1428, which defeats the purpose of resizing.

In the new fix, we

  1. make sure only scale the large side
  2. we do the scaling before injecting the special tokens.