ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
251 stars 109 forks source link

Support User Provided Training Data #637

Closed zhangpengshan closed 5 years ago

zhangpengshan commented 5 years ago

User Provided Training Data to replace norm outputs User needs to do:

  1. Set user specified training data in ModelConfig#dataSet#dataPath and header
  2. Run stats to get valid ColumnConfig.json
  3. No need run Norm
  4. Run VarSelect to select all variables, use ForceSelect or varSelect#filterNum set to > # of features in varselect step
  5. Set training data path to specified data path: "train" : { ... "customPaths" : { "normalizedDataPath" : "", "cleanedDataPath" : "", } },
zhangpengshan commented 5 years ago

Done in this commit: https://github.com/ShifuML/shifu/commit/c29223ad53027cec43b6f3c17c59cbf55fb327c8