databrickslabs / automl-toolkit

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Other
191 stars 44 forks source link

feature interaction: evaluation scoring on original input fields was too slow. #23

Open HCMY opened 2 years ago

HCMY commented 2 years ago

hey guys, i'm reading the source code and i would like to sincerely thank all those works you've done there, and public all of code too. But i noticed that some of code in "FeatureInteraction" is running too slow, for example:

`val nominalScores = nominalFields.map { x => x -> ColumnScoreData( scoreColumn( df, modelType, x, getFieldType("nominal"), totalRecordCount ), "nominal" )

}.toMap

val continuousScores = continuousFields.map { x =>
  x -> ColumnScoreData(
    scoreColumn(
      df,
      modelType,
      x,
      getFieldType("continuous"),
      totalRecordCount
    ),
    "continuous"
  )
}.toMap`

is there any suggestions for paralisim? looking forward your reply!