ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
249 stars 109 forks source link

Support Compact Norm And Train (NN, WDL, TensorFlow) #755

Closed Liu-Delin closed 3 years ago

Liu-Delin commented 3 years ago

Description

Enhance shifu norm -Dshifu.norm.only.selected=true

With this feature, we don't change the order of output columns anymore.

For example, if we have below columns:

column1: Meta
column2: Final Selected
column3: Target
column4: Not Selected
column5: Meta

Old logic will genearte data with below order:

column3: Target
column1: Meta
column5: Meta
column2: Final Selected

New logic will generate data with below order:

column1: Meta
column2: Final Selected
column3: Target
column5: Meta

Enhance workers to support trainning with compacted data

  1. Enhance NN worker.
  2. Enhance WDL worker.
  3. Enhance tensorflow of data loading.

Tests

  1. Unit test cases are added for NN and WDL.
  2. I manually tested NN, WDL and TensorFlow with cancer-judgement data set for init & stats & norm & train.
  3. I also manually tested them with cam2015 data set for norm & train.