elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
16 stars 98 forks source link

Helper functions for managing MLModels in Elasticsearch #192

Open sethmlarson opened 4 years ago

sethmlarson commented 4 years ago

Useful when you've trained your model on a development cluster and want to export and reimport into a production cluster.

Was thinking a to_json() which takes a buf and has an argument compress which gzip-compresses? Then we can have MLModel.from_json() as the import function.

cc @Winterflower @stevedodson for additional ideas here

stevedodson commented 4 years ago

@sethmlarson - this is a great area to add some really useful APIs. Maybe we should think about not just model import/export but a sub-area of features around model management.

For instance, it would be really useful to:

@tveasey may have more as well

We will have some of this management in Kibana over time, but performing low-level admin tasks and deep detail into model structure and debug would be really useful in eland.

Also, I do like storing as compressed JSON not pickle, but we should also think about the MLModel APIs being familiar to sklearn users.

tveasey commented 4 years ago

This is a great initiative.

tveasey may have more as well

One thing that springs to mind is that we have a lot of training information now written into the cluster: validation loss curves, hyperparameters chosen in each optimisation round, round duration, etc. We plan to expose this information in kibana dashboard, but I also think it makes sense to be available in say a notebook if that's your preferred environment.

We should also have a think about exposing quality measures for different models as well: this likely needs additional work on the ES side, but is a longer term vision we have around workflow.