Open yoshikiohshima opened 9 months ago
Come to think of it, the side that checks the result is true or false might be confused with the result it gives.
Yes, that was the case, it cannot follow the instruction to only produce "yes" and "no". But also it gets it wrong:
result Yes, these two entries describe the same event.
yes, 1965: Butler Lampson's thesis., 1967: Case Western Reserve University founded.
I ran wizardcoder-python-34b-v1.0.Q5_K_M.llamafile on substrate/NUC and ran the benchmark program in experiments/timeline-benchmark(with one line modification to APP_URL to access an LLM on network.
Interestingly it has very strong "no" bias and out of 121 trials, where 11 of those should be yes, it gave me only one "yes":