Open Alexia-I opened 8 months ago
@abhishekkrthakur @toshihikoyanase
Thank you for your detailed report.
Although I'm not a maintainer of this repository, it seems the issue might be addressed by updating this specific line in setup.py
.
@abhishekkrthakur Would you be open to considering an update to the pandas version? Thanks for considering.
Thank you for your reply. Yes, if no dependency conflicts are involved in this case, it can be addressed with a simple dependency update.
Issue Description: Hello. I have discovered a performance degradation in the
read_csv
function of pandas version 1.3.4 when handling CSV files with a large number of columns. This problem significantly increases the loading time from just a few seconds in the previous version 1.2.5 to several minutes, almost 60x diff. I found some discussions on GitHub related to this issue, including #44106 and #44192. I found thatautoxgb/src/autoxgb/predict.py
andautoxgb/src/autoxgb/autoxgb.py
both used the influenced api.Steps to Reproduce:
I have created a small reproducible example to better illustrate this issue.
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 1.3.5 or exploring other solutions to optimize the performance of loading CSV files. Any other workarounds or solutions would be greatly appreciated. Thank you!