An issue is found in recurrence.
Location tokens, {,... , , ... , }. It is used when tokenizer decodes, where the LLM comes out with some offset coordinates relative to a point(p+x), but the demo you showed is absolute coordinates(x1,y1,x2,y2). I think you did some post-processing to the output text, e.g.
An issue is found in recurrence. Location tokens, {,... , , ... , }. It is used when tokenizer decodes, where the LLM comes out with some offset coordinates relative to a point(p+x), but the demo you showed is absolute coordinates(x1,y1,x2,y2). I think you did some post-processing to the output text, e.g.