allenai / mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
MIT License
904 stars 34 forks source link

A subset of image features for mmc4 core? #6

Closed HenryHZY closed 9 months ago

HenryHZY commented 1 year ago

Hi @jmhessel , the image features for mmc4 are too large for a quick start (~1.8TB).

Do you have a plan to release a subset of image features for mmc4 core?

Thank you very much!

jmhessel commented 1 year ago

great question! this is a feature we definitely should support; let me take a look to see what I can do.

HenryHZY commented 1 year ago

@jmhessel Thanks for your quick reply. I believe that mmc4 will be of great help to the community.