X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache License 2.0
1.17k stars 71 forks source link

Check out our datasets, I think they might be useful for training models like this. #11

Open wendlerc opened 8 months ago

wendlerc commented 8 months ago

We created some large-scale multimodal datasets that contain OCR annotations, for some we ran paddle OCR over LAION images

  1. https://huggingface.co/datasets/wendlerc/LAION5B-en-PaddleOCR-parquet
  2. https://huggingface.co/datasets/wendlerc/LAION5B-hr-en-PaddleOCR-parquet for toand rendered images with blender,
  3. https://huggingface.co/datasets/wendlerc/RenderedText and here we captioned synthtext with BLIP2,
  4. https://huggingface.co/datasets/wendlerc/CaptionedSynthText

do you think those might be useful to tune your method?

Best, Chris

HAWLYQ commented 3 months ago

Hi, @wendlerc, great work! we will consider utilizing these datasets in our next work!

wendlerc commented 3 months ago

Let me know when you need access to the laion datasets, I set them to private for now.