Aidenzich / road-to-master

A repo to store our research footprint on AI
MIT License
19 stars 4 forks source link

Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models #66

Open Aidenzich opened 2 hours ago

Aidenzich commented 2 hours ago

Key Observation

Prompt Examples

Standard Prompting:

Instruct: Provide your output in the following text format:
Step by step reasoning: ...
Answer: The final answer is ...
Format-Restricting Instructions

JSON Format Prompting:

Instruct: Provide your output in the following valid JSON format:
{
  "step_by_step_reasoning": "...",
  "answer": "..."
}

YAML Format Prompting:

Instruct: Provide your output in the following valid YAML format:
reasoning: |
  <think step by step>,
answer: <answer>

XML Format Prompting:

Instruct: Provide your output in the following valid XML format:
<root>
  <reason>[think step by step]</reason>
  <answer>[answer]</answer>
</root>

Specific Task Prompts

Mathematical Problem-Solving Task:
Given: A mathematical question or problem
Required: A numerical answer only
Role: You are a math tutor assisting students of all levels
Process: Think step by step to solve the problem
Aidenzich commented 2 hours ago

Comparison

Screenshot 2024-11-16 at 4 06 11 PM Screenshot 2024-11-16 at 4 07 03 PM

Aidenzich commented 2 hours ago

The observed phenomena regarding the performance of large language models (LLMs) under format restrictions can be attributed to several key factors:

Aidenzich commented 2 hours ago

My review

I think the observed phenomena can be seen as biases coming from the training dataset. The performance of large language models (LLMs) is constrained by their training data, and biases and inconsistencies in that data may affect the model's performance under specific format restrictions. Additionally, the differences in the model's performance when handling different tasks may reflect its biases toward certain formats or types of responses during training. Therefore, the performance variations under format restrictions may be related to inherent biases in the training dataset.