comet-ml / opik

Open-source end-to-end LLM Development Platform
Apache License 2.0
2.2k stars 135 forks source link

[FR]: Toy Datasets for Opik #702

Closed sherpan closed 2 days ago

sherpan commented 3 days ago

Proposal summary

Would be nice to have some out-of-the box Toy datasets for users to play around and get started with the platform quicker. Something similar to load_iris in sklearn packages. To start, I was thinking of having a 15 question sample dataset for Machine Translation and ChatBots.

Not sure if the user should be expected to still add the dataset to their workspace or maybe users start out with the sample datasets already there and can just call get_dataset

I have the json file we could use for the MT Sample dataset. french_phrases.json

Motivation

Improve the developer experience so they can run evaluations without needing their own dataset

jverre commented 3 days ago

Hey, I think pandas has a few toy datasets that might be relevant or perhaps some HuggingFace datasets could be used for it as well for these use-cases.

We do have in our future roadmap some ideas around allowing users to make their datasets public that could solve this as well. For now I'll close as won't fix