IBMPredictiveAnalytics / Model_Random_Forest

Classification and regression based on a forest of trees using random inputs, utilizing conditional inference trees as base learners.
Apache License 2.0
16 stars 24 forks source link

No Output when an existing stream has the same variable names used by Random Forest #2

Open hangtime79 opened 9 years ago

hangtime79 commented 9 years ago

Error: Random Forest output errors out stream when variable name already exists.

Current Behavior: The Random Forest output uses the variable names "$C-Churn_Integer" and "$CC-Churn_Integer". If these variables already exist, then the stream will complete, but there will be no output. Below you will see where a C5 model node is before the Random Forest. Since they both use the same variable names; no output is returned. If these are reversed, Random Forest comes first, and C5 second then the C5 will append a number to the end of the variable name and all output will be available.

image

Workaround: Place any model nodes that may use the CC prefix after the Random Forest model node or change the name of the variables.

Code Fix Solution: Random Forest could be given a new prefix such as RF and RFC to distinguish it from existing model node predicates, however, this would not stop errors from occurring in the same stream when two Random Forest nodes were used.

Desired Solution: Change the node model output prefixes to use a different prefix such as RF. Check for the existence of variable names that may be used and roll to the next number RF, RF-1, RF-2, etc.