Closed daiwaid closed 2 months ago
Nice! This is a key feature. @daiwaid Could you add an example in examples/optimize/batch.py
? I will also use it to run some testing on your branch to help the review. Thanks!
In the example, perhaps do just 10 pairs to avoid burning tokens. Also, print out the stats so that we now about the latency etc.
Added example, and dealt with case when model will only output a single answer for an entire batch.
Please check #35 (note that #33 was merged earlier).
Merged new changes from #33, #35 and cleaned up commit history to align with main branch.
This should be now good to go.
Keeping a note on the performance improvements:
Benchmark: Matching done in 107.63s.
Benchmark: Precision 66.57
Benchmark: Recall 93.59
Benchmark: F1 score 77.8
Benchmark: Cost $0.53134
This is on the amazon-google
dataset; we see roughly 7x faster completion, 3% higher F1, with half of the cost.
The normal (single pair) matching code in libem/match.py and benchmark/util.py are left as is in anticipation of rework in PR #32. The TODOs in benchmark/util.py require changes from PR #33.