JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
470 stars 35 forks source link

Implementation of prompt techniques #1018

Closed chakravarthik27 closed 2 months ago

chakravarthik27 commented 2 months ago

Few-Shot Prompting:

Few-shot prompting is an advanced technique used to enhance the performance of a large language model (LLM) by utilizing a small number of targeted examples (known as "shots"). These examples comprise specific prompts and are designed to direct the LLM toward desired responses for particular tasks.

The Langtest framework assists in evaluating the LLM model by utilizing multiple datasets with few-shot prompts. The evaluation employs distinct prompt configurations for two datasets, "BoolQ" and "NQ-open". Each dataset uses tailored instructions and designated prompt types to shape the model’s responses, whether for instructional completions or conversational engagements.

BoolQ Configuration: The BoolQ (Boolean Questions) configuration tests the model’s capability to provide a straightforward 'true' or 'false' response based on the context. The guidelines emphasize the importance of conciseness and accuracy. This configuration includes sample interactions to instruct the model on handling context-dependent questions efficiently.

NQ-open Configuration: The NQ-open (Natural Questions - open book) setup assesses the model's ability to furnish concise answers to open-ended questions demanding specific information. Similar to BoolQ, this configuration uses an "instruct" prompt type aimed at eliciting direct and relevant responses without superfluous details.

Both configurations use the few-shot prompting approach to teach the model the anticipated response format and depth, enabling it to generalize from limited examples to new, unexplored queries, thereby testing its accuracy and contextual appropriateness with minimal guidance.

Configuration Methods:

Configuration in the Harness class can be done in two ways: using a YAML file or directly passing arguments in dictionary format to the Harness config.

YAML Configuration (saved as config.yaml):

prompt_config:
  "BoolQ":
    instructions: "Provide a concise response. The answer should be either `true` or `false`."
    prompt_type: "instruct"
    examples:
      - user:
          context: "The Good Fight -- A second 13-episode season premiered on March 4, 2018. On May 2, 2018, the series was renewed for a third season."
          question: "Is there a third series of The Good Fight?"
        ai:
          answer: "True"
      - user:
          context: "Lost in Space -- The fate of the castaways is never resolved, as the series was unexpectedly canceled at the end of season 3."
          question: "Did the Robinsons ever get back to Earth?"
        ai:
          answer: "False"
  "NQ-open":
    instructions: "Provide a brief and precise answer."
    prompt_type: "instruct"
    examples:
      - user:
          question: "Where does the electron come from in beta decay?"
        ai:
          answer: "An atomic nucleus"
      - user:
          question: "Who wrote 'You're a Grand Old Flag'?"
        ai:
          answer: "George M. Cohan"

tests:
  defaults:
    min_pass_rate: 0.8
  robustness:
    uppercase:
      min_pass_rate: 0.8
    add_typo:
      min_pass_rate: 0.8

Using the Harness Class:

harness = Harness(
                  task="question-answering",
                  model={"model": "gpt-3.5-turbo-instruct", "hub": "openai"},
                  data=[{"data_source": "BoolQ", "split": "test-tiny"},
                        {"data_source": "NQ-open", "split": "test-tiny"}],
                  config="config.yaml"
                 )

Execute the following commands to generate, run, and report:

harness.generate().run().report()