Convert the underlying av_counts dictionary into numpy vectors

The general idea is to store a set of attributes and a mapping of attribute values to vector indices (for nominal counts) and a mapping of attributes to vector indices (for numeric counts) in the root.

Then each instance is converted into a new object with the following:

nominal counts (a vector of zeros and ones)
numeric counts (a vector of numeric values or nas for missing)

The concepts will store:

nominal counts (a vector of counts for each attr-val)
three numeric vectors used for computing incremental mean and std
a vector of counts for each numeric attribute

Then incorporating an instance into a concept will be a simple vector addition (for nominals) and three operations for incrementally updating the numeric vectors, skipping those that are missing.

Merging concepts will also be a vector addition (for nominals) and something like 5 operations for incrementally merging the numeric vectors.

To make this work we might want to create a special instance class/object and maybe a function in the tree that takes an instance dict and returns a instance object that can be incorporated into the tree.

Then things like computing the expected correct guesses can be done with a simple vector dot product. If we keep everything in numpy arrays I expect we should see a HUGH performance gain.

cmaclell / concept_formation

Convert the underlying av_counts dictionary into numpy vectors #56