Open rferrazd opened 1 month ago
Hi, you can see the app.py on huggingface to see the details of how to output formatted results.
Hello, thanks for helping me with this. I believe that there is no app.py on huggingface
Hello, thanks for helping me with this. I believe that there is no app.py on huggingface
hhhh
In the demo space.
Hello @Ucas-HaoranWei, I am really sorry, but I am not finding it in the demo space. I see no explanation of how to get the output in Markdown format. I have only been able to get the output in Latex format. Could you kindly provide a screenshot of where the explanation of the syntax for getting the output in Markdown format is? I really appreciate the help, and congratulations on this amazing work!
@rferrazd the demo space code is here: https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo/blob/main/app.py but the output format is still mathpix markdown
In case others find this and still don't get it, the model appears to default to mathpix-markdown (which to my untrained eye looked very similar to latex -- I thought it was latex).
Hello,
In your paper it seemed that the model was able to extract the text and output it in Markdown format (with subtitles,headings, bold, etc). I am using your model from hugging face and I am not sure how to get the output in Markdown format. I have tried the following: res = model.chat(tokenizer, images[0], ocr_type='format with Markdown') What is the appropriate syntax to obtain the output in Markdown format? And, where can I read more about 'ocr_type', 'ocr_box', 'ocr_color', and 'render' - it is not present on the github repo.
Thank you for the help!
Have you got any solution?
I'm not an authority on the subject, but I'm 90% sure that the model is outputting, by default, a flavor of markdown called mathpix-markdown which is essentially a combination of latex and markdown.
I do not believe it can output pure markdown. I've got a separate issue going here about parsing the mathpix-markdown response in python which would allow you to convert it to your desired format (e.g. pure markdown, html etc). I'm hoping the authors respond with a way to parse the response in python. If by chance you're operating in javascript/node then its your lucky day.
Hello,
In your paper it seemed that the model was able to extract the text and output it in Markdown format (with subtitles,headings, bold, etc). I am using your model from hugging face and I am not sure how to get the output in Markdown format. I have tried the following: res = model.chat(tokenizer, images[0], ocr_type='format with Markdown') What is the appropriate syntax to obtain the output in Markdown format? And, where can I read more about 'ocr_type', 'ocr_box', 'ocr_color', and 'render' - it is not present on the github repo.
Thank you for the help!