Open nathan-weinberg opened 3 weeks ago
This seems very reasonable!
So @nathan-weinberg scipy
is installed by way of scikit-learn
--while we don't use sklearn
in too many places within lm_eval.api.metrics
, it might be annoying for many users to have to manually install a scikit-learn extra for any task using F1.
If scipy being installed is a big dealbreaker though, can likely just make sklearn an optional dependency as well. But curious if there are any in-betweens commonly taken here or that could still be useful
@haileyschoelkopf (responding from the same team as @nathan-weinberg) I don't know of any in-between options and so making sklearn an optional dep seems like it would need to be part of the solution. If there is an alternative you are thinking of please let us know. Really appreciate the consideration!
Hi @haileyschoelkopf
I'm also the same team. We haven't considered the dependency on sklearn
. I can see how the additional dependency makes the situation more complicated.
I have another idea. We are rebuilding all wheels from sources anyway. Our build pipeline has the capability to modify a package to e.g. drop a dependency. It would help us a lot of you could either consolidate import scipy, sklearn
into one module or use function-level imports instead of module-level imports for scipy
and sklearn
. That would reduce the complexity of our downstream changes.
Example:
def cb_multi_fi(items):
from sklearn.metrics import f1_score
preds, golds = zip(*items)
preds = np.array(preds)
golds = np.array(golds)
f11 = f1_score(y_true=golds == 0, y_pred=preds == 0)
f12 = f1_score(y_true=golds == 1, y_pred=preds == 1)
f13 = f1_score(y_true=golds == 2, y_pred=preds == 2)
avg_f1 = np.mean([f11, f12, f13])
return avg_f1
What do you think?
@haileyschoelkopf I'm also from the same team as well. I'd be happy to open up a PR consolidating the imports into one module or moving the imports around to being function-level as long as you'd be receptive to reviewing and accepting such a change.
Function-level imports seem like a good idea! Would be happy to accept that change : )
@haileyschoelkopf feel free to reassign the issue to me if the above PR is satisfactory on y'alls side!
Context: The
scipy
package can be difficult to redistribute as it requires fortran and some other custom libraries.Looking at the code,
scipy
is only used in two places:_align_bags
func inutils.py
for DROP: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/drop/utils.py#L125calculate_z_value
func inmodel_comparator.py
script: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/scripts/model_comparator.py#L30Would the maintainers be amenable to making
scipy
a local import for these functions so other aspects of the library can be used without needing to installscipy
?