$ python main.py --step eval \
--lm-response-dataset-id chansung/lm_response_test \
--push-eval-to-hf-hub \
--eval-dataset-id chansung/eval_dataset_test \
--hf-token <HF-TOKEN> \ # env var HF_TOKEN will be used by default
--gemini-api-key <GEMINI-API-KEY> # env var GEMINI_API_KEY will be used by default
This PR separates evaluation step into two steps,
batch-infer
andeval
.batch-infer
step: let fine-tuned local LLM to generate responses on a given test prompteval
step: evaluate the generated responses frombatch-infer
step via service LLM (Gemini)The outputs from each step could be pushed to Hugging Face Dataset repositories.
Example CLI for each step
batch-infer
stepgenerated example Hugging Face Dataset on
batch-infer
step: https://huggingface.co/datasets/chansung/lm_response_testeval
stepgenerated example Hugging Face Dataset on
eval
step: https://huggingface.co/datasets/chansung/eval_dataset_test