UBC-MDS / datascience_eda

This package includes functions helping with common tasks during EDA stage of a data science project
MIT License
0 stars 2 forks source link

datascience_eda

build codecov Deploy Documentation Status

This package includes functions assisting data scientists with various common tasks during the exploratory data analysis stage of a data science project. Its functions will help the data scientist to do preliminary analysis on common column types like numeric columns, categorical columns and text columns; it will also conduct several experimental clusterings on the dataset.

Our functions are tailored based on our own experience, there are also similar packages published on PyPi, a few good ones worth mentioning:

Installation

There are several dependencies not available on test.pypi, please use the exact command below to install our package.

$ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple datascience-eda

Main Functions

Dependencies

List of depencies can be found at: https://github.com/UBC-MDS/datascience_eda/blob/main/pyproject.toml

Usage

import pandas as pd
import datascience_eda as eda

original_df = pd.read_csv("/data/menu.csv")
numeric_features = eda.get_numeric_columns(original_df)
numeric_transformer = make_pipeline(SimpleImputer(), StandardScaler())
preprocessor = make_column_transformer(
    (numeric_transformer, numeric_features)
)
df = pd.DataFrame(
    data=preprocessor.fit_transform(original_df), columns=numeric_features
)

eda.explore_numeric_columns(df)
eda.explore_categorical_columns(df, ["categorical_column1", "categorical_column2"])
eda.explore_text_columns(df)
eda.explore_clustering(df)

Documentation

The official documentation is hosted on Read the Docs: https://datascience_eda.readthedocs.io/en/latest/

Contributors

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab. Please check out our CONDUCTING.rst if you are interested in contributing to this project.

Credits

This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.