Open Aidenzich opened 2 hours ago
I think the observed phenomena can be seen as biases coming from the training dataset. The performance of large language models (LLMs) is constrained by their training data, and biases and inconsistencies in that data may affect the model's performance under specific format restrictions. Additionally, the differences in the model's performance when handling different tasks may reflect its biases toward certain formats or types of responses during training. Therefore, the performance variations under format restrictions may be related to inherent biases in the training dataset.
Key Observation
Prompt Examples
Standard Prompting:
JSON Format Prompting:
YAML Format Prompting:
XML Format Prompting:
Specific Task Prompts