algorithmfoundry / Foundry

The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning
Other
131 stars 41 forks source link

AbstractVectorThresholdMaximumGainLearner: Sanity check triggered #45

Open Zero3 opened 9 years ago

Zero3 commented 9 years ago

I was playing around with the parameters for the Random Forest example from #6 and somehow triggered a sanity check in AbstractVectorThresholdMaximumGainLearner that probably should not be triggerable:

java.lang.RuntimeException: bestThreshold (8.30760652058587) lies outside range of values (8.30760652058587, 9.14680325466277]
    at gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner.computeBestGainAndThreshold(AbstractVectorThresholdMaximumGainLearner.java:383)
    at gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner.computeBestGainAndThreshold(AbstractVectorThresholdMaximumGainLearner.java:209)
    at gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner.learn(AbstractVectorThresholdMaximumGainLearner.java:141)
    at gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner.learn(AbstractVectorThresholdMaximumGainLearner.java:45)
    at gov.sandia.cognition.learning.algorithm.tree.RandomSubVectorThresholdLearner.learn(RandomSubVectorThresholdLearner.java:212)
    at gov.sandia.cognition.learning.algorithm.tree.RandomSubVectorThresholdLearner.learn(RandomSubVectorThresholdLearner.java:47)
    at gov.sandia.cognition.learning.algorithm.tree.CategorizationTreeLearner.learnNode(CategorizationTreeLearner.java:237)
    at gov.sandia.cognition.learning.algorithm.tree.CategorizationTreeLearner.learnNode(CategorizationTreeLearner.java:37)
    at gov.sandia.cognition.learning.algorithm.tree.AbstractDecisionTreeLearner.learnChildNodes(AbstractDecisionTreeLearner.java:129)
    at gov.sandia.cognition.learning.algorithm.tree.CategorizationTreeLearner.learnNode(CategorizationTreeLearner.java:246)
    at gov.sandia.cognition.learning.algorithm.tree.CategorizationTreeLearner.learn(CategorizationTreeLearner.java:178)
    at gov.sandia.cognition.learning.algorithm.tree.CategorizationTreeLearner.learn(CategorizationTreeLearner.java:37)
    at gov.sandia.cognition.learning.algorithm.ensemble.AbstractBaggingLearner.step(AbstractBaggingLearner.java:195)
    at gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner.learn(AbstractAnytimeBatchLearner.java:147)
    ...
jbasilico commented 9 years ago

Do you have an example input for this?

Zero3 commented 9 years ago

I can reproduce it by changing these two lines in the example:

int maxDepth = 10;
int minLeafSize = 10;

to:

int maxDepth = 5;
int minLeafSize = 5;

I use the input data I posted at https://gist.github.com/Zero3/55963dcf14c87e439668 which can be deserialized from a file using something like this:

try (ObjectInput input = new ObjectInputStream(new BufferedInputStream(new FileInputStream("algorithmfoundry-Foundry-issues-45.ser"))))
{
    Collection<InputOutputPair<Vector, String>> trainData = (Collection<InputOutputPair<Vector, String>>) input.readObject();
}
catch (IOException | ClassNotFoundException ex)
{
    throw new RuntimeException(ex);
}

(Note that my OutputType is String while the example uses Boolean)

Zero3 commented 9 years ago

(Please note that the test case above uses the same wrong parameter names as used in the example in #6)

Zero3 commented 9 years ago

I did some further testing with your new RandomForestFactory. I can consistently trigger the sanity check with minLeafSize = {2, 3, 4} when maxTreeDepth > 1.

jbasilico commented 9 years ago

Yes, I still need to look into this. Have you seen it happen when minLeafSize = 0?

Zero3 commented 9 years ago
java.lang.IllegalArgumentException: minSplitSize must be positive (was 0).
    at gov.sandia.cognition.util.ArgumentChecker.assertIsPositive(ArgumentChecker.java:61)
    at gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner.setMinSplitSize(AbstractVectorThresholdMaximumGainLearner.java:457)
    at gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner.<init>(AbstractVectorThresholdMaximumGainLearner.java:86)
    at gov.sandia.cognition.learning.algorithm.tree.VectorThresholdInformationGainLearner.<init>(VectorThresholdInformationGainLearner.java:78)
    [...]