Closed daiwaid closed 3 months ago
The benchmark/README.md
contains confusing instructions, e.g. -n gpt-4o
this is very easy to confuse with -m gpt-4o
. Besides, https://github.com/abcsys/libem/blob/0b6e38c918cf68458ebe4258796363fd1abbe9a8/Makefile#L90 are not updated to reflect the name changes.
The overall changes are good. Just that they can be simplified. Please see if you can address them in a follow-up PR. (no need to revert).
Let's address these in #76
@daiwaid Upon closer look and run, there are several issues that need to be addressed.
First:
Also:
In the benchmark suite, the configurations are more complicated than necessary. E.g., the benchmark suite should always print and log the results.