etalab-ia / DGML

This repo contains the code used to develop DGML (Data Gouv for Machine Learning), a data repository of datasets from data.gouv.fr for Machine Learning.
https://datascience.etalab.studio/dgml/
MIT License
12 stars 0 forks source link

Analyzed csv dataset with exclusively pandas profiling #11

Open psorianom opened 3 years ago

psorianom commented 3 years ago

When using the automatic mode of DGML, sometimes there are datasets that are not usable for machine learning (they do not pass the filters or there is some other problem with them). We still compute its pandas-profile. The problem is that they do not have a task so they do not appear on the DGML website. This calls for the question, what to do with these datasets ? Should we add a new task category such as "EDA" (Exploratory Data Analysis) ?

gsantar commented 3 years ago

Adding an EDA category would be a very good solution, I am ok with it. I believe that the pandas profilings can still be useful if someone wants to explore the dataset. Do you want me to add this category to the dgml.py code ? And to the app filters

psorianom commented 3 years ago

Yes thanks @giuliasantarsieri !