X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache License 2.0
1.12k stars 68 forks source link

Dataset Questions #50

Open Tokyo81 opened 2 months ago

Tokyo81 commented 2 months ago

Does mPLUG/DocStruct4M and mPLUG/DocDownstream-1.0 contain image files in the dataset, which cannot be verified on the hugging face.

HAWLYQ commented 2 months ago

Hi @Tokyo81 , I don't really understand what “cannot be verified on the hugging face.” means. Could you provide more details?

Tokyo81 commented 2 months ago

I'm asking you a question because the Hugging Face Viewer doesn't check whether the data set contains an image file.

HAWLYQ commented 2 months ago

I'm asking you a question because the Hugging Face Viewer doesn't check whether the data set contains an image file.

Hi, @Tokyo81 , This may be due to that images are zipped and split into multiple files~