Dadmatech / DadmaTools

DadmaTools is a Persian NLP tools developed by Dadmatech Co.
Apache License 2.0
179 stars 39 forks source link

Potential performance Issue: Slow read_csv() Function with pandas 1.3.3 #70

Open TendouArisu opened 6 months ago

TendouArisu commented 6 months ago

Issue Description:

Hello. I have discovered a performance degradation in the read_csv function of pandas version 1.3.3. And I notice some parts of the repository depend on pandas 1.3.3 in dadmatools/requirements.txt and some other dependencies require pandas below 1.4. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #44158 and #44610. I also found that dadmatools/pipeline/informal2formal/utils.py and dadmatools/pipeline/informal2formal/VerbHandler.py used the influenced api. There may be more files using the influenced api.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 1.4 or exploring other solutions to optimize the performance of read_csv. Any other workarounds or solutions would be greatly appreciated. Thank you!

sadeghjafari5528 commented 6 months ago

Thank you for your comment; I will try to update the Pandas version. However, I'm uncertain whether our "informal2formal" function utilized the influenced API.

On another note, I wanted to inquire if you have knowledge about the Persian language?

TendouArisu commented 5 months ago

No, I don't know the Persian language. I encountered this problem in other repositories, so I wanted to note other repositories this potential problem.

sadeghjafari5528 commented 5 months ago

Thank you for your suggestion.