Closed NZ42 closed 1 month ago
I would also be interested in page-specific mathpix OCR outputs for the textbook subset if those were available too
Thanks for your interest. Unfortunately, the parsing results from the mathpix api are not page-specific, i.e., given a pdf, a parsed doc like markdown will be obtained, which contains the whole content of the pdf, not separated by the page.
Understood! Would it be possible to release them still? And what about the original LaTeX files? Thanks a lot!
We can release these latex files. Before that, could I know what you intended to use about these latex files, and is it for commercial purposes? Thanks for your understanding.
That would be lovely! My intended use case is scientific OCR. I would like to see if I can handle the mapping from page image to latex for the creation of an open-source scientific OCR dataset -- no commercial applications planned, just research. Thank you!
On Wed, Jul 3, 2024 at 6:21 PM Zengzhi Wang @.***> wrote:
We can release these latex files. Before that, could I know what you intended to use about these latex files, and is it for commercial purposes? Thanks for your understanding.
— Reply to this email directly, view it on GitHub https://github.com/GAIR-NLP/MathPile/issues/6#issuecomment-2206739771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJBFWM6LH4HLSIR7AWGAADZKQQILAVCNFSM6AAAAABKH2QZS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBWG4ZTSNZXGE . You are receiving this because you authored the thread.Message ID: @.***>
Hi, you can download it from the following sharing link. Please DO NOT use it for commercial purposes, just for research. And, feel free to cite our MathPile paper! :)
https://drive.google.com/file/d/1m1UIhpXT3_9LR61BN3JB0xyUJZqVzumR/view?usp=sharing
Thank you!
On Fri, 5 Jul 2024, 10:07 Zengzhi Wang, @.***> wrote:
Hi, you can download it from the following sharing link. Please DO NOT use it for commercial purposes, just for research. And, feel free to cite our MathPile paper https://arxiv.org/abs/2312.17120! :)
https://drive.google.com/file/d/1m1UIhpXT3_9LR61BN3JB0xyUJZqVzumR/view?usp=sharing
— Reply to this email directly, view it on GitHub https://github.com/GAIR-NLP/MathPile/issues/6#issuecomment-2210402551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJBFWJXZGZGPQ6D4DMEG5DZKZH5TAVCNFSM6AAAAABKH2QZS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGQYDENJVGE . You are receiving this because you authored the thread.Message ID: @.***>
Fantastic work, thanks for the release.
Is there any chance you could also release the original latex files? Even just privately. I'm interested in this for a scientific OCR project.
Thanks a lot!