abcsys / libem

Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
https://libem.org
Apache License 2.0
17 stars 3 forks source link

fix: reset args.kwargs before running benchmark suites #69

Closed Char15Xu closed 1 month ago

Char15Xu commented 1 month ago

This pr fixed keyError issue when running different datasets in all files from benchmark/suite.

Relavent issue: #67 bullet point 1

issue: args.kwargs kept values from old dataset.

fIx: reset args.kwargs before running dataset

zenodflow commented 1 month ago

@Char15Xu What is the use of args.kwargs? It feels like we need a bigger change for the args.kwargs design. It's mysterious why one needs to reset it.

Char15Xu commented 1 month ago

Based on my understanding of the original code, it is used in libem/prepare/datasets to parse data from datasets.

zenodflow commented 1 month ago

@Char15Xu @daiwaid Instead of relying on the client code to reset the kwargs -- which is really an unknown unknown for them - we should do a deep copy of the args here: https://github.com/abcsys/libem/blob/b16907cdd5334f915ef585cbf74fc86a2c1db3ff/benchmark/classic/abt_buy.py#L38

!

It's usually a bad pattern passing in an object, modify it, and then pass it around in the return value. The client code needs to be aware of any of these modifications / side-effects made by the method.

daiwaid commented 1 month ago

Alternate fix implemented in PR #73.