ManifoldRG / Manifold-KB

This repository serves as a knowledge base with key insights, details from other research and implementations to serve as references and one place to document various possible paths to achieve something.
GNU General Public License v3.0
4 stars 0 forks source link

Neko : Add more details on datasets #4

Open bhavul opened 11 months ago

bhavul commented 11 months ago

Ideally instead of the simple bullet list, it would be useful to have a table defining a few key things about the datasets.

As an example, this could look like this but we can iterate and add/modify anythings in the design:

Dataset Source Size Approx # Tokens Modalities Remarks
Conceptual Captions https://ai.google.com/research/ConceptualCaptions/ X GB/TB XYZ

This would help us identify the right datasets to include while training our Neko model.