etalab-ia / DGML

This repo contains the code used to develop DGML (Data Gouv for Machine Learning), a data repository of datasets from data.gouv.fr for Machine Learning.
https://datascience.etalab.studio/dgml/
MIT License
12 stars 0 forks source link

Handle compressed csv files and create new EDA category #12

Open gsantar opened 3 years ago

gsantar commented 3 years ago

This PR introduces two changes in dgml.py:

  1. An extract_compressed_csv function : this function extracts all the zipped csv files that might be present in the data directory (such as csv_sample) so that the csv files can then be loaded
  2. An EDA category is added in the task column of the main csv file: this adds a new filter in the app, so that the datasets only having a pandas profile are also shown in it