deep-diver / llamaduo

This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.
https://huggingface.co/papers/2408.13467
Apache License 2.0
286 stars 29 forks source link

Enhance/spliteval #11

Closed deep-diver closed 6 months ago

deep-diver commented 6 months ago

This PR separates evaluation step into two steps, batch-infer and eval.

The outputs from each step could be pushed to Hugging Face Dataset repositories.


Example CLI for each step

batch-infer step

$ python main.py --step batch-infer \
--ft-model-id sayakpaul/gemma-2b-sft-qlora-no-robots \
--load-in-8bit \
--test-ds-id sayakpaul/no_robots_only_coding \
--push-lm-responses-to-hf-hub \
--lm-response-dataset-id chansung/lm_response_test \
--lm-response-append \
--hf-token <HF-TOKEN> # env var HF_TOKEN will be used by default

generated example Hugging Face Dataset on batch-infer step: https://huggingface.co/datasets/chansung/lm_response_test

eval step

$ python main.py --step eval \
--lm-response-dataset-id chansung/lm_response_test \
--push-eval-to-hf-hub \
--eval-dataset-id chansung/eval_dataset_test \
--hf-token <HF-TOKEN> \ # env var HF_TOKEN will be used by default
--gemini-api-key <GEMINI-API-KEY> # env var GEMINI_API_KEY will be used by default

generated example Hugging Face Dataset on eval step: https://huggingface.co/datasets/chansung/eval_dataset_test

deep-diver commented 6 months ago

@sayakpaul

except the last comment, I have addressed your comments :)

sayakpaul commented 6 months ago

Cool. LGTM. Feel free to merge.