EpistasisLab / pmlb

PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms.
https://epistasislab.github.io/pmlb/
MIT License
801 stars 133 forks source link

Replacing `pandas-profiling` (deprecated) with `ydata-profiling` #183

Open gAldeia opened 2 weeks ago

gAldeia commented 2 weeks ago

The previously known pandas profiling is now part of a bigger project and is decoupling from the idea that it is intended to be used only with data frames.

The package's name has changed, and the last version of pandas-profiling was released over a year ago.

The GitHub workflow for profiling new datasets is not working as it should due to deprecated dependencies.

I am trying to submit new datasets, and have been facing some issues related to pandas-profilng. It seems that one of its dependencies has migrated (https://docs.pydantic.dev/2.0/migration/#basesettings-has-moved-to-pydantic-settings, also see the GitHub actions error when running python -m pmlb.profiling). After looking at the PyPI docs (https://pypi.org/project/pandas-profiling/), I found that we should just replace the package, and it should be fine.

Please let me know if there is any changes that I should make.

Thank you for the attention and for reviewing this PR!

trangdata commented 2 weeks ago

Thanks for this @gAldeia! I'm struggling to resolve the action error related to reticulate at the moment. Any insight would be much appreciated! 🙏🏽

gAldeia commented 2 weeks ago

Hi @trangdata! I actually saw your PR a few moments after creating it. Sorry for the duplicate. I decided to keep mine to remind us to change the docs, regardless of which PR will be merged.

In fact, I spent the last two days trying to debug this reticulate error. I managed to reproduce it locally. What seems to be happening here is that reticulate is using a Python version other than 3.8 (the one used on the GitHub actions). Sometimes, mine uses Python 3.10, and sometimes 3.12. I am working on having reticulate work with Python 3.8, but I think either the reticulate::install_miniconda or the pip that comes with it is messing up with versions, forcing Python to be other than 3.8, but I am not sure. I am running it in WSL. I tried installing Ubuntu dev packages to work with Tiff, but none actually worked. I also tried some explicit installation of Pillow, but I also had no success.

I will keep working on it, and if I figure out how to solve it I will let you know!