ResidentMario / missingno

Missing data visualization module for Python.
MIT License
3.9k stars 516 forks source link

md.pattern #101

Open vankesteren opened 4 years ago

vankesteren commented 4 years ago

Would it be possible to include a plot for patterns of missingness similar to the md.pattern functionality in the mice package in R?

Here's an example from that package: image

this plot tells us the following: 13 observations have 0 missing values 3 observations have missing values on chl only 10 observations have missing values on chl etc...

the patterns are easily visible and compact: the plot scales with the number of missingness patterns, not with the number of rows in the dataframe!

ResidentMario commented 4 years ago

This is a neat idea! Will see what I can do.

SultanOrazbayev commented 4 years ago

I am both interested in the feature and interested in contributing to this. This would be especially handy with data that exceeds memory (so would be great to make this dask compatible).

SultanOrazbayev commented 4 years ago

@vankesteren: while the PR is reviewed, it would be great if you could do an independent test-drive of the new pattern function.

vankesteren commented 4 years ago

I'll see what I can do!

vankesteren commented 4 years ago

Looks great! Here is the pattern function applied to the same dataset:

image

I do have the following suggestions:

SultanOrazbayev commented 4 years ago

Thanks for the suggestions!

re: adding number of missing values: do you have a suggestion for the name of this column? values_missing?

vankesteren commented 4 years ago

yeah, that works! or maybe n_missing?