fixie-ai / ultravox

A fast multimodal LLM for real-time voice
https://ultravox.ai
MIT License
871 stars 47 forks source link

Tool for adding new synthetic columns #14

Closed farzadab closed 3 months ago

farzadab commented 3 months ago

This tools allows you to add new columns, by using a template and feeding it into gpt-4o to create a synthetic "ground-truth" response.

image


$ just enrich_ds -d fixie-ai/boolq-audio -s train -b with_explanation -c explanation -t $HF_WRITE_TOKEN

Loading dataset "fixie-ai/boolq-audio", new column name: "explanation", template:
Passage: {passage}

Question: {question}

Answer: {answer}

Provide a short explanation to the question given the passage that entails the answer.
Processing split "train"...

Map (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9427/9427 [16:47<00:00,  9.35 examples/s]
farzadab commented 3 months ago

I agree that it's best to integrate these two into the same file. I'll take a look to see if there's a way to combine them in a simplified manner.

farzadab commented 3 months ago

PTAL