Kipok / NeMo-Skills

A pipeline to improve skills of large language models
https://kipok.github.io/NeMo-Skills/
Apache License 2.0
185 stars 41 forks source link

push decontaminator code #88

Closed wedu-nvidia closed 1 month ago

wedu-nvidia commented 3 months ago

I developed a pipeline to extract similar questions from the training data corresponding to the test data.

We can use the following command to retrieve similar questions, which will then be saved back to the original_test.jsonl file.

python nemo_skills/evaluation/retrieve_similar_question.py \
  train_jsonl_files=/home/wedu/workspace/gsm8k-augmentation-questions/output*.jsonl \
  test_jsonl_files=/home/wedu/workspace/NeMo-Skills/datasets/gsm8k/original_test.jsonl \
  model=multi-qa-MiniLM-L6-cos-v1 \
  device=cuda

I also created two yaml files for the LLM to determine whether the question from the test set and the candidate question are rephrased versions of each other. nemo_skills/inference/prompt/openai/math-detect-few-shot.yaml nemo_skills/inference/prompt/openai/math-detect-zero-shot.yaml

We can leverage the generate_solutions.py code if we host our Llama-405B model as outlined below:

python nemo_skills/inference/generate_solutions.py \
    output_file=/home/wedu/workspace/NeMo-Skills/datasets/gsm8k/test_405b_few_shot_gsm8k-augmentation-questions.jsonl \
    +prompt=openai/math-detect-few-shot \
    ++dataset=gsm8k \
    ++split_name=original_test \
    ++max_samples=5000 \
    ++server.server_type=tensorrt_llm \
    ++server.host=batch-block1-1075 \
    ++sandbox.host=batch-block1-1075 \
    ++batch_size=512

I also created a contaminator type similar to our math-grader if we want to use GPT-4 directly. We can also use GPT-4 using the following command:

python nemo_skills/evaluation/evaluate_results.py \
    prediction_jsonl_files=/home/wedu/workspace/NeMo-Skills/datasets/gsm8k/test_405b_zero_shot.jsonl  \
    eval_type=contaminator \
    ++eval_config.grading_type=llm \
    ++eval_config.grading_config.skip_filled=False \
    ++eval_config.grading_config.batch_size=50
Kipok commented 1 month ago

Merged in refactoring branch in #114