Closed RokIvansek closed 8 years ago
Turns out this does not yield a substantial speed up. Because it introduces some confusing code I will not include it.
getprobs is the method that uses the data.domain info about unique values and get_probs is the method that calls np.unique
Testing for 100000 samples calculating probabilities for 3 attributes: Time get_probs: 0.06428435299979658 Time getprobs: 0.046929360999759716
Testing for 1000000 samples: Time get_probs: 0.691636645666828 Time getprobs: 0.4972618786669045
The entropy function now uses np.unique to get unique values in input array. This info is already present in the domain of an orange table. So instead of running np.unique just access this info from data domain using data.domain.variables[i].values and data.domain.class_var.values. The method must then except integers corresponding to consecutive column numbers instead of the actual arrays.