elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 986 forks source link

Add support for creating suggester fields with weights #2229

Open landlord-matt opened 1 month ago

landlord-matt commented 1 month ago

What kind an issue is this?

Feature description

Setting weights for your suggester fields allow you to prioritize the suggestions returned from query (note that they can't be sorted). For the standard API, you set the weights by including a weight parameter for each document, but to my understanding this is not possible when using es-hadoop and Spark DataFrames. In Databricks I would do something like this

    (
        data.write.format("org.elasticsearch.spark.sql")
        .options(**es_conf)
        .mode("overwrite")
        .save(full_index_name)
    )

To upload to an index with this mapping

{
    "full_index_name": {
        "mappings": {
            "dynamic": "strict",
            "properties": {
                "CompanyIdentifier": {
                    "type": "keyword"
                },
                "CompanyName": {
                    "type": "completion",
                    "analyzer": "simple",
                    "preserve_separators": true,
                    "preserve_position_increments": true,
                    "max_input_length": 50
                }
            }
        }
    }
}

I tried with adding an int column called weight, but that didn't do it. I tried search for parameter in the documentation, but I couldn't find any. I haven't tried a nested field, but I doubt it works.

I guess it is not trivial what the syntax would be if you have more than one completion field either, but you'll figure something out :)