Open cmaclell opened 7 years ago
An alternative idea is to add a new kind of feature that supports something like numpy arrays directly. Now that I'm thinking about it this might be the best way to do it.
For example, an instance might look like the following:
{'X': np.array([1,2,3,4]), '_y': 1}
Then, internally we could do the cobweb3 thing and maintain means and stds for each of the X variables, but this would give users the flexibility to take advantage of numpy arrays if they know their data has a fixed dimension.
The general idea is to store a set of attributes and a mapping of attribute values to vector indices (for nominal counts) and a mapping of attributes to vector indices (for numeric counts) in the root.
Then each instance is converted into a new object with the following:
The concepts will store:
Then incorporating an instance into a concept will be a simple vector addition (for nominals) and three operations for incrementally updating the numeric vectors, skipping those that are missing.
Merging concepts will also be a vector addition (for nominals) and something like 5 operations for incrementally merging the numeric vectors.
To make this work we might want to create a special instance class/object and maybe a function in the tree that takes an instance dict and returns a instance object that can be incorporated into the tree.
Then things like computing the expected correct guesses can be done with a simple vector dot product. If we keep everything in numpy arrays I expect we should see a HUGH performance gain.