Enhance/spliteval - Githubissues

deep-diver commented 6 months ago

This PR separates evaluation step into two steps, batch-infer and eval.

batch-infer step: let fine-tuned local LLM to generate responses on a given test prompt
eval step: evaluate the generated responses from batch-infer step via service LLM (Gemini)

The outputs from each step could be pushed to Hugging Face Dataset repositories.

Example CLI for each step

batch-infer step

$ python main.py --step batch-infer \
--ft-model-id sayakpaul/gemma-2b-sft-qlora-no-robots \
--load-in-8bit \
--test-ds-id sayakpaul/no_robots_only_coding \
--push-lm-responses-to-hf-hub \
--lm-response-dataset-id chansung/lm_response_test \
--lm-response-append \
--hf-token <HF-TOKEN> # env var HF_TOKEN will be used by default

generated example Hugging Face Dataset on batch-infer step: https://huggingface.co/datasets/chansung/lm_response_test

eval step

$ python main.py --step eval \
--lm-response-dataset-id chansung/lm_response_test \
--push-eval-to-hf-hub \
--eval-dataset-id chansung/eval_dataset_test \
--hf-token <HF-TOKEN> \ # env var HF_TOKEN will be used by default
--gemini-api-key <GEMINI-API-KEY> # env var GEMINI_API_KEY will be used by default

generated example Hugging Face Dataset on eval step: https://huggingface.co/datasets/chansung/eval_dataset_test

deep-diver commented 6 months ago

@sayakpaul

except the last comment, I have addressed your comments :)

sayakpaul commented 6 months ago

Cool. LGTM. Feel free to merge.

deep-diver / llamaduo

Enhance/spliteval #11

Example CLI for each step