HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Added feature names to weight output #46

Closed richardwu closed 5 years ago

richardwu commented 5 years ago

Motivation

Easier to debug and more visibility into model.

Example

Before:

INFO:root:featurizer FreqFeaturizer,size 11,max 0.0000,min -0.6392,avg -0.0834,abs_avg 0.0834,weight 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | -0.639 | -0.0 | -0.278 | -0.0 | -0.0 | -0.0
featurizer OccurFeaturizer,size 11,max 1.8073,min -0.4504,avg 0.3492,abs_avg 0.5782,weight 0.153 | -0.28 | -0.196 | 1.807 | 0.544 | 1.082 | -0.45 | 0.587 | 0.886 | 0.042 | -0.334
featurizer OccurFeaturizer,size 121,max 1.5835,min -0.4418,avg 0.0272,abs_avg 0.0584,weight 0.0 | 0.0 | -0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | 0.491 | -0.368 | -0.144 | 1.584 | -0.136 | 0.0 | -0.442 | 0.575 | 0.482 | 0.05 | -0.256 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | -0.0 | 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | -0.136 | -0.005 | -0.028 | -0.15 | 0.63 | 0.986 | -0.157 | -0.0 | 0.301 | 0.077 | -0.067 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | 0.0 | 0.0 | 0.0 | -0.0 | 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | 0.0
featurizer LangModelFeat,size 110,max 0.0351,min -0.0599,avg 0.0005,abs_avg 0.0031,weight 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | -0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.02 | -0.037 | 0.035 | 0.01 | 0.019 | -0.06 | 0.009 | -0.01 | -0.006 | 0.028 | 0.0 | 0.0 | -0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.006 | 0.011 | 0.015 | -0.006 | 0.012 | 0.011 | 0.007 | 0.008 | 0.0 | 0.028 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 0.0 | 0.0 | 0.0 | -0.0 | 0.0 | -0.0 | 0.0 | 0.0

After

INFO:root:featurizer InitAttFeaturizer,size 11,max 1.0000,min 1.0000,avg 1.0000,abs_avg 1.0000,weights:
Age 1.0
Workclass 1.0
Education 1.0
Maritalstatus 1.0
Occupation 1.0
Relationship 1.0
Race 1.0
Sex 1.0
HoursPerWeek 1.0
Country 1.0
Income 1.0
featurizer InitSimFeaturizer,size 11,max 0.1468,min -0.0417,avg 0.0096,abs_avg 0.0171,weights:
Age 0.0
Workclass 0.0
Education -0.0
Maritalstatus 0.0
Occupation -0.0
Relationship -0.042
Race -0.0
Sex 0.147
HoursPerWeek -0.0
Country -0.0
Income -0.0
featurizer FreqFeaturizer,size 11,max 0.0000,min -0.2355,avg -0.0263,abs_avg 0.0263,weights:
Age -0.0
Workclass 0.0
Education 0.0
Maritalstatus -0.0
Occupation 0.0
Relationship -0.236
Race -0.0
Sex -0.054
HoursPerWeek 0.0
Country 0.0
Income -0.0
featurizer OccurFeaturizer,size 11,max 1.0691,min -0.1866,avg 0.2014,abs_avg 0.2758,weights:
Age 0.17
Workclass -0.117
Education -0.039
Maritalstatus 1.069
Occupation 0.243
Relationship 0.463
Race -0.187
Sex 0.183
HoursPerWeek 0.415
Country 0.081
Income -0.068
featurizer OccurAttrFeaturizer,size 121,max 1.0272,min -0.1638,avg 0.0187,abs_avg 0.0269,weights:
Age X Age -0.0
Age X Workclass 0.0
Age X Education -0.0
Age X Maritalstatus -0.0
Age X Occupation 0.0
Age X Relationship 0.0
Age X Race 0.0
Age X Sex 0.0
Age X HoursPerWeek -0.0
Age X Country -0.0
Age X Income 0.0
Workclass X Age -0.0
Workclass X Workclass 0.0
Workclass X Education 0.0
Workclass X Maritalstatus -0.0
Workclass X Occupation -0.0
Workclass X Relationship 0.0
Workclass X Race 0.0
Workclass X Sex -0.0
Workclass X HoursPerWeek 0.0
Workclass X Country -0.0
Workclass X Income -0.0
Education X Age 0.0
Education X Workclass -0.0
Education X Education -0.0
Education X Maritalstatus -0.0
Education X Occupation -0.0
Education X Relationship -0.0
Education X Race 0.0
Education X Sex 0.0
Education X HoursPerWeek 0.0
Education X Country 0.0
Education X Income 0.0
Maritalstatus X Age -0.0
Maritalstatus X Workclass -0.0
Maritalstatus X Education 0.0
Maritalstatus X Maritalstatus -0.0
Maritalstatus X Occupation -0.0
Maritalstatus X Relationship -0.0
Maritalstatus X Race 0.0
Maritalstatus X Sex -0.0
Maritalstatus X HoursPerWeek -0.0
Maritalstatus X Country 0.0
Maritalstatus X Income -0.0
Occupation X Age 0.0
Occupation X Workclass 0.0
Occupation X Education 0.0
Occupation X Maritalstatus -0.0
Occupation X Occupation 0.0
Occupation X Relationship 0.0
Occupation X Race -0.0
Occupation X Sex 0.0
Occupation X HoursPerWeek -0.0
Occupation X Country -0.0
Occupation X Income 0.0
Relationship X Age 0.216
Relationship X Workclass -0.139
Relationship X Education -0.038
Relationship X Maritalstatus 1.027
Relationship X Occupation -0.032
Relationship X Relationship -0.0
Relationship X Race -0.164
Relationship X Sex 0.183
Relationship X HoursPerWeek 0.252
Relationship X Country 0.053
Relationship X Income -0.061
Race X Age -0.0
Race X Workclass -0.0
Race X Education -0.0
Race X Maritalstatus -0.0
Race X Occupation 0.0
Race X Relationship -0.0
Race X Race -0.0
Race X Sex -0.0
Race X HoursPerWeek -0.0
Race X Country -0.0
Race X Income -0.0
Sex X Age -0.044
Sex X Workclass 0.023
Sex X Education 0.007
Sex X Maritalstatus 0.052
Sex X Occupation 0.277
Sex X Relationship 0.463
Sex X Race -0.016
Sex X Sex -0.0
Sex X HoursPerWeek 0.168
Sex X Country 0.036
Sex X Income -0.002
HoursPerWeek X Age 0.0
HoursPerWeek X Workclass -0.0
HoursPerWeek X Education 0.0
HoursPerWeek X Maritalstatus 0.0
HoursPerWeek X Occupation 0.0
HoursPerWeek X Relationship -0.0
HoursPerWeek X Race 0.0
HoursPerWeek X Sex 0.0
HoursPerWeek X HoursPerWeek -0.0
HoursPerWeek X Country 0.0
HoursPerWeek X Income -0.0
Country X Age -0.0
Country X Workclass 0.0
Country X Education 0.0
Country X Maritalstatus 0.0
Country X Occupation 0.0
Country X Relationship 0.0
Country X Race -0.0
Country X Sex -0.0
Country X HoursPerWeek 0.0
Country X Country -0.0
Country X Income 0.0
Income X Age 0.0
Income X Workclass 0.0
Income X Education 0.0
Income X Maritalstatus -0.0
Income X Occupation -0.0
Income X Relationship -0.0
Income X Race -0.0
Income X Sex -0.0
Income X HoursPerWeek 0.0
Income X Country 0.0
Income X Income 0.0
featurizer LangModelFeat,size 110,max 0.0161,min -0.0304,avg -0.0005,abs_avg 0.0014,weights:
Age_emb_0 -0.0
Age_emb_1 0.0
Age_emb_2 -0.0
Age_emb_3 -0.0
Age_emb_4 0.0
Age_emb_5 -0.0
Age_emb_6 -0.0
Age_emb_7 -0.0
Age_emb_8 -0.0
Age_emb_9 -0.0
Workclass_emb_0 0.0
Workclass_emb_1 0.0
Workclass_emb_2 0.0
Workclass_emb_3 0.0
Workclass_emb_4 0.0
Workclass_emb_5 -0.0
Workclass_emb_6 0.0
Workclass_emb_7 -0.0
Workclass_emb_8 0.0
Workclass_emb_9 0.0
Education_emb_0 -0.0
Education_emb_1 -0.0
Education_emb_2 0.0
Education_emb_3 0.0
Education_emb_4 0.0
Education_emb_5 -0.0
Education_emb_6 -0.0
Education_emb_7 -0.0
Education_emb_8 0.0
Education_emb_9 -0.0
Maritalstatus_emb_0 -0.0
Maritalstatus_emb_1 0.0
Maritalstatus_emb_2 0.0
Maritalstatus_emb_3 -0.0
Maritalstatus_emb_4 0.0
Maritalstatus_emb_5 0.0
Maritalstatus_emb_6 0.0
Maritalstatus_emb_7 0.0
Maritalstatus_emb_8 0.0
Maritalstatus_emb_9 -0.0
Occupation_emb_0 0.0
Occupation_emb_1 0.0
Occupation_emb_2 -0.0
Occupation_emb_3 0.0
Occupation_emb_4 0.0
Occupation_emb_5 -0.0
Occupation_emb_6 -0.0
Occupation_emb_7 -0.0
Occupation_emb_8 -0.0
Occupation_emb_9 0.0
Relationship_emb_0 0.006
Relationship_emb_1 -0.01
Relationship_emb_2 0.016
Relationship_emb_3 0.008
Relationship_emb_4 -0.009
Relationship_emb_5 -0.03
Relationship_emb_6 -0.004
Relationship_emb_7 -0.011
Relationship_emb_8 0.011
Relationship_emb_9 -0.019
Race_emb_0 -0.0
Race_emb_1 -0.0
Race_emb_2 0.0
Race_emb_3 0.0
Race_emb_4 0.0
Race_emb_5 0.0
Race_emb_6 -0.0
Race_emb_7 0.0
Race_emb_8 -0.0
Race_emb_9 -0.0
Sex_emb_0 0.003
Sex_emb_1 -0.002
Sex_emb_2 -0.003
Sex_emb_3 0.0
Sex_emb_4 0.0
Sex_emb_5 -0.001
Sex_emb_6 -0.001
Sex_emb_7 0.004
Sex_emb_8 -0.006
Sex_emb_9 -0.007
HoursPerWeek_emb_0 0.0
HoursPerWeek_emb_1 -0.0
HoursPerWeek_emb_2 -0.0
HoursPerWeek_emb_3 0.0
HoursPerWeek_emb_4 0.0
HoursPerWeek_emb_5 0.0
HoursPerWeek_emb_6 -0.0
HoursPerWeek_emb_7 -0.0
HoursPerWeek_emb_8 -0.0
HoursPerWeek_emb_9 -0.0
Country_emb_0 -0.0
Country_emb_1 0.0
Country_emb_2 0.0
Country_emb_3 0.0
Country_emb_4 -0.0
Country_emb_5 0.0
Country_emb_6 -0.0
Country_emb_7 0.0
Country_emb_8 0.0
Country_emb_9 -0.0
Income_emb_0 -0.0
Income_emb_1 -0.0
Income_emb_2 0.0
Income_emb_3 0.0
Income_emb_4 0.0
Income_emb_5 0.0
Income_emb_6 0.0
Income_emb_7 -0.0
Income_emb_8 -0.0
Income_emb_9 0.0
featurizer ConstraintFeat,size 4,max -0.0114,min -0.6993,avg -0.2968,abs_avg 0.2968,weights:
fixed pred: t1."Relationship"='husband', violation pred: t1."Sex"='female' -0.285
fixed pred: t1."Sex"='female', violation pred: t1."Relationship"='husband' -0.699
fixed pred: t1."Relationship"='wife', violation pred: t1."Sex"='male' -0.191
fixed pred: t1."Sex"='male', violation pred: t1."Relationship"='wife' -0.011