Open daixque opened 1 month ago
Pinging @elastic/ml-core (Team:ML)
@daixque As long as the feature is a numeric one, there is a workaround that consists to write a script_score based query feature extractor that return the params directly:
In your case, here is what it may looks likes in eland:
QueryFeatureExtractor(
feature_name="user_age",
query={
"script_score": {
"query": {"match_all": {} },
"script": {"source": "return params.user_age"},
}
},
),
Hi @afoucret , thank you for your comment. I'm aware of that kind of workaround can be used, but I feel it's not intuitive (and may not be performant to build training dataset). So it would be great if Eland and Elasticsearch support it natively.
Overview
As of 8.13, the learning to rank functionality of Elasticsearch and Eland only support the feature variable which associate with field data of the Elasticsearch's index.
But sometimes a user may need to train the model with feature values which is provided directly and not as field data. Elasticsearch and Eland should have the capability which accepts feature values is not interact with field data.
For example, our notebook shows how we can implement a search app for movie data. In this example, all feature values are provided by Elasticsearch, such as BM25 score and/or result of script score. But sometimes user wants to train their model with the data which is from outside of Elasticsearch. Typical example would be the user profile such as age and/or gender, etc., because those are not related to the each document (in this case each movie).
Model training with Eland
At the moment
LTRModelConfig
only accepts list ofQueryFeatureExtractor
, but in the new version of Eland it should also accept another extractor which represents direct feature value which doesn't associate with any field data of the index.Elasticsearch learning to rank query
When an application app issues the query, feature values should be directly passed to Elasticsearch. It may look like
rescore.learning_to_rank.prams.user_age
in the example below: