comet-ml / opik

Open-source end-to-end LLM Development Platform
Apache License 2.0
1.35k stars 73 forks source link

Gidim/ load dataset from file #289

Closed gidim closed 1 week ago

gidim commented 1 week ago

Details

This PR adds a method that allows loading/inserting a Dataset from a JSONL file based on user feedback.

Issues

Resolves #

Testing

I added an e2e test for the new function and a file with an example dataset

Documentation

Documentation was updated with the following section: You can also insert items from a JSONL file:

dataset.read_jsonl_from_file("path/to/file.jsonl")

The format of the JSONL file should be a JSON object per line. For example:

{"input": {"user_question": "Hello, world!"}}
{"input": {"user_question": "What is the capital of France?"}, "expected_output": {"assistant_answer": "Paris"}}
gidim commented 1 week ago

@alexkuzmik fixed the above. Wasn't sure how much you want me to test in the unit given the underlying from_json() is already tested. Feel free if you want me to change anything else

alexkuzmik commented 1 week ago

@gidim FYI here's the reason of my last changes, might be useful to know for future contributions :) https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments