PAIR-code / lit

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
https://pair-code.github.io/lit
Apache License 2.0
3.49k stars 355 forks source link

Potential performance issue: .to_dict method slow in pandas below 2.2 #1416

Open TendouArisu opened 8 months ago

TendouArisu commented 8 months ago

Issue Description:

Hello. I have discovered a performance degradation in the .to_dict function of pandas version 1.5.3. And I noticed that some parts of the repository depend on the pandas version 1.5.3. I found that many files such as lit_nlp/examples/datasets/glue.py used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on pandas GitHub related to this issue, including #50990 and #54824.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 2.2 or exploring other solutions to optimize the performance. Any other workarounds or solutions would be greatly appreciated. Thank you!

RyanMullins commented 8 months ago

Thanks for the report! There are a few (significant) version bumps in the works for LIT and I'll add this to the list. Will keep you updated on progress as best I can.