google-research-datasets / conceptual-captions

Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
Other
516 stars 26 forks source link

Contributing. #19

Open arthurwolf opened 5 months ago

arthurwolf commented 5 months ago

Some models trained on this (like llava) do not perform well at understanding comic book pages.

Would you be open to a PR with some data related to comic book pages?

Using Creative-Commons images obviously. Or even public domain images if that's required.

What is the process to offer contributions?

Thanks a lot in advance.