online demo vs opensource codes

LingyvKong / OneChart

[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"

Apache License 2.0

192 stars 15 forks source link

online demo vs opensource codes #7

Closed HappyLynn closed 6 months ago

HappyLynn commented 6 months ago

Hello, thank you for your excellent work. However, with the given code, the online demo provided on GitHub currently seems to have inconsistent effects. My understanding is that the generated code should not have any randomness with do_sample=False. Could you please confirm if the online demo is consistent with the inference code that is open-sourced on Git? Currently, the failure rate for parsing is quite high, and there is a relatively high rate of generated JSON formats being illegal. Thank you again for your good job.

LingyvKong commented 6 months ago

Hi, thank you for your attention to our work. The weights and code on the web demo are consistent with those in GitHub. Your understanding is right. Please check if your transformer version is 4.32.1. And could you show me your failure case?

Best wishes

HappyLynn commented 6 months ago

Thank you for your reply. However, have you tested with the latest version of the transformer? I used the latest version of the transformer, and the ability to maintain the JSON format in the results was quite poor. However, when using version 4.32.1 you mentioned, it was consistent with the effects of your online demo. One last question, I am curious, is such a smaller model sufficient for chart recognition tasks? Would a larger model yield better results? I've noticed that the current model performs poorly when the image content is more complex.

LingyvKong commented 6 months ago

Hi，

Yes, I have observed issues with the latest version of the transformer, and I am still investigating the reason.
Switching to a larger LLM could potentially improve the model, depending on whether the bottleneck in your case is in the vision encoder or the decoding part. In relation to this, you might find our other works, such as vary and vary-toy, insightful as they explore related challenges.
Regarding the task of chart structural extraction, I think smaller models are already sufficient and more economical for this specific application. The issue of poor performance when the image content is complex is likely due to the model not being trained on such data. SFT with hundreds or thousands of similar data may significantly enhance the performance. Looking forward to seeing your results.

Best wishes