OpenGVLab / VisionLLM

VisionLLM Series
https://arxiv.org/abs/2305.11175
Apache License 2.0
762 stars 15 forks source link

An issue is found in recurrence. #3

Open Maycbj opened 1 year ago

Maycbj commented 1 year ago

An issue is found in recurrence. Location tokens, {,... , , ... , }. It is used when tokenizer decodes, where the LLM comes out with some offset coordinates relative to a point(p+x), but the demo you showed is absolute coordinates(x1,y1,x2,y2). I think you did some post-processing to the output text, e.g.

,, to ,, ,, image
spacewalkingninja commented 1 year ago

Please share code and model pls!