Open MonolithFoundation opened 4 months ago
Probably! Are you thinking a training set of pdf => markdown
?
It's probably not something I'll be working on right away, but I will be putting together some benchmarks for testing different models. But that will probably be in the 50-100 document range. And probably not meaningful as as training set.
if it is can be trainig that will helpful in training MLLM model for OCR and Markdown converting like gpt4o
---- Replied Message ---- | From | Tyler @.> | | Date | 07/28/2024 02:59 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [getomni-ai/zerox] Is it possible to opensource a markdown converting dataset? (Issue #3) |
Probably! Are you thinking a training set of pdf => markdown?
It's probably not something I'll be working on right away, but I will be putting together some benchmarks for testing different models. But that will probably be in the 50-100 document range. And probably not meaningful as as training set.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hey @MonolithFoundation, You could give this a shot. Pretty early but would help you bootstrap a dataset. Have published a few sample datasets as well. Cheers!
@wizenheimer hello, looks nice a very nice tool! thanks for opensourcing, however, didn't found an open link to PDF (image) -> markdown (text) dataset out of box.
Will you consider open a such gened dataset?
Is it possible to opensource a markdown converting dataset?