elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
628 stars 98 forks source link

LTR feature logger #648

Closed afoucret closed 6 months ago

afoucret commented 6 months ago

The PR provides tools for LTR query feature extraction into Eland.

  1. Some base class have been defined to declare the LTR config:
ltr_config = LTRModelConfig(
  feature_extractors=[
    QueryFeatureExtractor(
      feature_name="title_bm25", 
      query={"match": {"title": "{{query}}"}},
    ),
    QueryFeatureExtractor(
      feature_name="popularity",
      query={
        "script_score": {
          "query": {"exists": {"field": "popularity"}},
          "script": {"source": "return doc['popularity'].value"},
        }
      },
    )
  ]
)
  1. A feature logger can be created using:
feature_logger = FeatureLogger(es_client="http://localhost:9200",  es_index="my-index", ltr_model_config=ltr_model_config)
  1. Document features can be extracted for a set of query params using:
feature_logger.extract_features(
  query_params={ "query": "my search query" },
  doc_ids= ["doc-1", "doc-2"] 
)