In the previous version, there are two errors in the box normalization step:
we'll scale the bbox for both the height and width dimension even if only one dimension is oversized
we perform the scaling after inserting the special tokens, which might have the `[1000,1000,1000,1000] box
As such, let's say we have a page of size (700, 1024) (width, height). In the previous normalization, it will:
scale the width dimension to 1000
since it normalizes after injecting a special token of coordinate 1000, it will resize it into 1000*1000/700=1428, which defeats the purpose of resizing.
In the new fix, we
make sure only scale the large side
we do the scaling before injecting the special tokens.
In the previous version, there are two errors in the box normalization step:
As such, let's say we have a page of size (700, 1024) (width, height). In the previous normalization, it will:
1000
, it will resize it into1000*1000/700=1428
, which defeats the purpose of resizing.In the new fix, we