InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.47k stars 153 forks source link

How is the bounding box of the images preprocessed in 4K HD? #270

Closed KooSung closed 5 months ago

KooSung commented 5 months ago

In InternLM-XComposer2 4K HD, Dynamic Image Partition is used. How is the bounding box of the relevant images preprocessed? For instance, how are the coordinates normalized and how is the scale transformation performed?

LightDXY commented 5 months ago

please refer to https://github.com/InternLM/InternLM-XComposer/issues/261