UBC-MDS / HAM_Python

Explore & Impute Missing Data in Python
MIT License
4 stars 4 forks source link

HAM_Python

Handle All Missing (Values)

Build Status

Project contributors:

  1. Duong Vu
  2. Jordan Dubchak
  3. Linsey Yao

To install please execute the following from the command line:

pip install git+https://github.com/UBC-MDS/HAM_Python.git

Introduction

Our package intends to explore the pattern of missing values in users' dataset and also imputes the missing values using several methods.

We decided to make this project because we have not found any package that handle both tasks in either R or Python. In R, we found Amelia and vis_dat package that only visualize the missing data and in Python we found fancyimpute that deals with missing value but does not have any visualization and missingno handle the visualization only. We thought this would be better package for users who do not have much experience in data wrangling.

Dependencies

Python 3

matplotlib.pyplot

numpy

pandas

seaborn

warnings

Functions

Currently, our package only handles continuous features.

Typical Usage

import numpy as np
import pandas as pd
from ham.ham import todf, impute_missing, compare_model, vis_missing
# (from HAM_Python.ham import ham)
raw_data = np.matrix([[1, 2, 3], [3, np.nan, 5], [9, 22, np.nan]])
raw_data = todf(raw_data, ["H", "A", "M"])
vis_missing(raw_data, missing_val_char = np.nan)

df = impute_missing(raw_data, 'A', "CC", np.NaN)

print(df)

compare_model(raw_data, 'A', ("CC","MIP"), np.nan)

HAM in R

This package is also available in R