elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
627 stars 98 forks source link

import new third party models into elastic #697

Closed imenbkr closed 4 weeks ago

imenbkr commented 1 month ago

I want to load a third party model (clustering model: hdbscan) into Elastic, but I've encountered an error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[128], line 1
----> 1 es_model = MLModel.import_model(
      2   es_client= client,
      3   model_id='hdbscan',
      4   model=clusterer,
      5   feature_names=df_copy.columns
      6 )

File /opt/conda/lib/python3.10/site-packages/eland/ml/ml_model.py:380, in MLModel.import_model(cls, es_client, model_id, model, feature_names, classification_labels, classification_weights, es_if_exists, es_compress_model_definition)
    249 @classmethod
    250 def import_model(
    251     cls,
   (...)
    269     es_compress_model_definition: bool = True,
    270 ) -> "MLModel":
    271     """
    272     Transform and serialize a trained 3rd party model into Elasticsearch.
    273     This model can then be used for inference in the Elastic Stack.
   (...)
    377     >>> es_model.delete_model()
    378     """
--> 380     return cls._import_model(
    381         es_client=es_client,
    382         model_id=model_id,
    383         model=model,
    384         feature_names=feature_names,
    385         classification_labels=classification_labels,
    386         classification_weights=classification_weights,
    387         es_if_exists=es_if_exists,
    388         es_compress_model_definition=es_compress_model_definition,
    389     )

File /opt/conda/lib/python3.10/site-packages/eland/ml/ml_model.py:499, in MLModel._import_model(cls, es_client, model_id, model, feature_names, classification_labels, classification_weights, es_if_exists, es_compress_model_definition, inference_config)
    494 """
    495 Actual implementation of model import used by public API methods.
    496 """
    498 es_client = ensure_es_client(es_client)
--> 499 transformer = get_model_transformer(
    500     model,
    501     feature_names=feature_names,
    502     classification_labels=classification_labels,
    503     classification_weights=classification_weights,
    504 )
    505 serializer = transformer.transform()
    506 model_type = transformer.model_type

File /opt/conda/lib/python3.10/site-packages/eland/ml/transformers/__init__.py:39, in get_model_transformer(model, **kwargs)
     35         kwargs = {k: v for k, v in kwargs.items() if k in accepted_kwargs}
     37         return transformer(model, **kwargs)
---> 39 raise NotImplementedError(
     40     f"Importing ML models of type {type(model)}, not currently implemented"
     41 )

NotImplementedError: Importing ML models of type <class 'hdbscan.hdbscan_.HDBSCAN'>, not currently implemented
pquentin commented 4 weeks ago

Hello! Eland supports importing models from scikit-learn, XGBoost, LightGBM and PyTorch. If I'm understanding correctly, you're trying to import an hdbscan model and that is not supported, sorry.

imenbkr commented 4 weeks ago

Hello Quentin, I've also tried importing a model from scikit-learn: from sklearn.cluster import KMeans I got the same error below, are regression and classification models the only supported types of models that we can import into elastic?

NotImplementedError Traceback (most recent call last) Cell In[39], line 1 ----> 1 es_model = MLModel.import_model( 2 es_client= client, 3 model_id='kmeans_model', 4 model=kmeans, 5 feature_names=['feature_1', 'feature_2'] 6 )

File /opt/conda/lib/python3.10/site-packages/eland/ml/ml_model.py:380, in MLModel.import_model(cls, es_client, model_id, model, feature_names, classification_labels, classification_weights, es_if_exists, es_compress_model_definition) 249 @classmethod 250 def import_model( 251 cls, (...) 269 es_compress_model_definition: bool = True, 270 ) -> "MLModel": 271 """ 272 Transform and serialize a trained 3rd party model into Elasticsearch. 273 This model can then be used for inference in the Elastic Stack. (...) 377 >>> es_model.delete_model() 378 """ --> 380 return cls._import_model( 381 es_client=es_client, 382 model_id=model_id, 383 model=model, 384 feature_names=feature_names, 385 classification_labels=classification_labels, 386 classification_weights=classification_weights, 387 es_if_exists=es_if_exists, 388 es_compress_model_definition=es_compress_model_definition, 389 )

File /opt/conda/lib/python3.10/site-packages/eland/ml/ml_model.py:499, in MLModel._import_model(cls, es_client, model_id, model, feature_names, classification_labels, classification_weights, es_if_exists, es_compress_model_definition, inference_config) 494 """ 495 Actual implementation of model import used by public API methods. 496 """ 498 es_client = ensure_es_client(es_client) --> 499 transformer = get_model_transformer( 500 model, 501 feature_names=feature_names, 502 classification_labels=classification_labels, 503 classification_weights=classification_weights, 504 ) 505 serializer = transformer.transform() 506 model_type = transformer.model_type

File /opt/conda/lib/python3.10/site-packages/eland/ml/transformers/init.py:39, in get_model_transformer(model, kwargs) 35 kwargs = {k: v for k, v in kwargs.items() if k in accepted_kwargs} 37 return transformer(model, kwargs) ---> 39 raise NotImplementedError( 40 f"Importing ML models of type {type(model)}, not currently implemented" 41 )

NotImplementedError: Importing ML models of type <class 'sklearn.cluster._kmeans.KMeans'>, not currently implemented

On Mon, Jun 10, 2024 at 1:48 PM Quentin Pradet @.***> wrote:

Hello! Eland supports importing models from scikit-learn, XGBoost, LightGBM and PyTorch. If I'm understanding correctly, you're trying to import an hdbscan model https://hdbscan.readthedocs.io/en/latest/index.html and that is not supported, sorry.

— Reply to this email directly, view it on GitHub https://github.com/elastic/eland/issues/697#issuecomment-2158256864, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY7P6TH3R3QUTQG7YYOADHLZGWODRAVCNFSM6AAAAABIWPGAACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJYGI2TMOBWGQ . You are receiving this because you authored the thread.Message ID: @.***>

pquentin commented 4 weeks ago

Correct, only regression and classification. Elasticsearch ML does not support clustering.