This is a prototype for storing models that can be re-used later without the need for re-training. It currently only works with scikit-learn models.
Model table is created if it does not exist
If a model is specified, there is a database lookup to see if it exists.
a. If it exists, it is loaded and used
b. If it does not exist, the model will be created and later stored
The model is used in the calculation
If the model hasn't already been stored, it is stored according to a specified name (optional), or an automatically suggested name, along with metadata about the model.
ToDo
[ ] Helper functions for returning the out of bag improvement (oob_improvements_) and feature importances (feature_importances_) that just sits on top of CDB_RetrieveModelParams. This works already, just specify it like so:
[x] All declared geometries are geometry(Geometry, 4326) for general geoms, or geometry(Point, 4326)
[x] Existing functions in crankshaft python library called from the extension are kept at least from version N to version N+1 (to avoid breakage during upgrades).
[ ] Docs for public-facing functions are written
[x] New functions follow the naming conventions: CDB_NameOfFunction. Where internal functions begin with an underscore
Model Storage
This is a prototype for storing models that can be re-used later without the need for re-training. It currently only works with scikit-learn models.
ToDo
oob_improvements_
) and feature importances (feature_importances_
) that just sits on top ofCDB_RetrieveModelParams
. This works already, just specify it like so:Future enhancements
References
geometry(Geometry, 4326)
for general geoms, orgeometry(Point, 4326)
CDB_NameOfFunction
. Where internal functions begin with an underscore