Open Aidenzich opened 7 months ago
Generic feedback: like improving the efficiency of the code, lacks this precision and direction.
These are essentially what you'll see in the gpt-prompt-engineer repo and the automatic evaluation metrics at my company. It appears that pair-to-pair comparison is more useful than simply using prompts for evaluation or a fine-tuned classifier.
https://arxiv.org/pdf/2303.17651.pdf