EpistasisLab / Aliro

Aliro: AI-Driven Data Science
https://epistasislab.github.io/Aliro
GNU General Public License v3.0
222 stars 63 forks source link

Potential performance issue: .to_json memory leak in pandas below 1.4 version #642

Open TendouArisu opened 4 months ago

TendouArisu commented 4 months ago

Issue Description:

Hello. I have discovered a performance degradation in the .to_json function of pandas version 1.1.0. And I notice the repository depends on pandas 1.1.0 in docker/lab/files/requirements.txt. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #43877 and #45489. I also found that ai/metalearning/export_to_mongo.py used the influenced api. There may be more files using the influenced api.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 1.4 or exploring other solutions to optimize the performance of .to_json. Any other workarounds or solutions would be greatly appreciated. Thank you!