elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 990 forks source link

Hive "es.mapping.routing" Error #1180

Open myamor163 opened 6 years ago

myamor163 commented 6 years ago

elasticsearch-hadoop 6.0.0 Hive External Table Seting One:
'es.mapping.join'='my_join_field' #my_join_field type map<string,string> es.mapping.routing not set

 Prompt mistakes when writing table 
     "Field [my_join_field] needs to be a primitive; found [map<string,string>]"

Hive External Table Seting Two:

 'es.mapping.join'='my_join_field'   #my_join_field type map<string,string>
 'es.mapping.routing'='<4>'   #Specify the constant Or field

  Prompt OK when writing table。

And this is not in conformity with the "- Added a chained field extractor that checks for routing first, and if a routing has not been set, will search for the join field's parent id. "

jbaiera commented 6 years ago

Join and Routing fields must be primitive fields, ideally string data. It's not clear to me how a map of strings can make sense as a join field. The second run you had is using a constant value of 4 which is a primitive value, and thus preempts even checking for a join field value.

If you still feel like this is an error, please let me know, but this looks like it's functioning as intended.

myamor163 commented 6 years ago

@jbaiera I didn't think so, Join properties can only be set to the string when the parent is, when the child is an object, that object inside the Hive how do you say? If the subdocument join set to a string, will prompt can not find the parent.

In addition, you can take a look at this, I don't think so I understand error:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

es.mapping.join (default none) The document field/property name containing the document’s join field. Constants are not accepted. Join fields on a document must contain either the parent relation name as a string, or an object that contains the child relation name and the id of its parent. If a child document is identified when using this setting, the document’s routing is automatically set to the parent id if no other routing is configured in es.mapping.routing. es.mapping.routing (default depends on es.mapping.join) The document field/property name containing the document routing. To specify a constant, use the format. If a join field is specified with es.mapping.join, then this defaults to the value of the join field’s parent id. If a join field is not specified, then this defaults to none.

jbaiera commented 6 years ago

Ah it seems you are correct. I had forgot about the child data format case. I'll take a look again.

AdvithSubbu commented 4 years ago

what should be the value in es.mapping.join ? (both for parent and child).

Example:

  1. my my index

PUT test_tree { "mappings":{ "properties":{ "fam_details": { "type":"join", "relations":{ "danfamily":[ "comfamily", "enterfamily" ], "comfamily":"richardfamily" }}}}}

  1. Data inserted in from Spark2 using esHadoop and mapping for

Parent insert

esconf["es.mapping.id"] = "uniqueid" __esconf["es.mapping.join"] = "famdetails" , / also tried danfamily _/__

Child insert

esconf["es.mapping.id"] = "uniqueid of child" esconf["es.mapping.join"] = "comfamily" esconf["es.mapping.routing"] = "uniqueid" (same field as parent)

I am able to insert data from spark. I am able to search data

but no data when i query using has_parent, has_child queries

Any Help

Ah it seems you are correct. I had forgot about the child data format case. I'll take a look again.

AdvithSubbu commented 4 years ago

what should be the value in es.mapping.join ? (both for parent and child).

Example:

  1. my my index

PUT test_tree { "mappings":{ "properties":{ "fam_details": { "type":"join", "relations":{ "danfamily":[ "comfamily", "enterfamily" ], "comfamily":"richardfamily" }}}}}

  1. Data inserted in from Spark2 using esHadoop and mapping for

Parent insert

esconf["es.mapping.id"] = "uniqueid" __esconf["es.mapping.join"] = "famdetails" , / also tried danfamily _/__

Child insert

esconf["es.mapping.id"] = "uniqueid of child" esconf["es.mapping.join"] = "comfamily" esconf["es.mapping.routing"] = "uniqueid" (same field as parent) I am able to insert data from spark. I am able to search data but no data when i query using has_parent, has_child queries Any Help

Ah it seems you are correct. I had forgot about the child data format case. I'll take a look again.

@myamor163 , Can you please help with this setting ?