Open vankesteren opened 4 years ago
This is a neat idea! Will see what I can do.
I am both interested in the feature and interested in contributing to this. This would be especially handy with data that exceeds memory (so would be great to make this dask
compatible).
@vankesteren: while the PR is reviewed, it would be great if you could do an independent test-drive of the new pattern
function.
I'll see what I can do!
Looks great! Here is the pattern function applied to the same dataset:
I do have the following suggestions:
md.pattern
R plot above)md.pattern
plot)mvcount
column simply count
(or to avoid overlap with the column names, maybe something like _count_
?). mvcount
in my head goes immediately to "multivariate count"Thanks for the suggestions!
re: adding number of missing values: do you have a suggestion for the name of this column? values_missing
?
yeah, that works! or maybe n_missing
?
Would it be possible to include a plot for patterns of missingness similar to the
md.pattern
functionality in themice
package inR
?Here's an example from that package:
this plot tells us the following: 13 observations have 0 missing values 3 observations have missing values on chl only 10 observations have missing values on chl etc...
the patterns are easily visible and compact: the plot scales with the number of missingness patterns, not with the number of rows in the dataframe!