有没有完整的数据集

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B

Apache License 2.0

506 stars 29 forks source link

有没有完整的数据集 #7

Closed liuheng0111 closed 3 months ago

liuheng0111 commented 3 months ago

hi，很多数据不好下载，请问后续会不会把图片也开源出来？

runninglsy commented 3 months ago

由于图片数量众多、来源广泛，我们无计划提供所有图片的打包下载，请根据 https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#dataset 中的image source下载图片。

liuheng0111 commented 2 months ago

@runninglsy 图片下载遇到的一些问题，1. docmatix-si-900k.jsonl 这里的图片路径怎么和huggingface上解析出来的parquet二进制图片对应上呢？2. ai2d-mc-15k.json 这个里面的image_support怎么下载？3.http://dosa.cds.iisc.ac.in/kvqa/KVQAimgs.tar.gz kvqa的图片没办法下载下来