elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 989 forks source link

can't append a value to array in elasticsearch from hive #2078

Open ealio opened 1 year ago

ealio commented 1 year ago

What kind an issue is this?

Issue description

Description I want to sync data from hive to elasticsearch using the tool of elasticsearch-hadoop. But there is an field of array type (defined as keyword type), I want append the new value into the array when I run insert SQL in hive. But it always failed with the error below. Ended Job = job_local1121272017_0005 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-2: HDFS Read: 0 HDFS Write: 0 FAIL

Steps to reproduce

1. Create an index in the elasticsearch with mapping defined as shown below. curl -X PUT -H "Content-Type:application/json" -d '{"mappings":{"employee":{"dynamic":false,"properties" : {"empname" : {"type" : "keyword"},"id" : {"type" : "long"},"targetid":{"type":"keyword"}}}}}' "http://localhost:9200/vidaa"

It's created successfully as shown below.

es@ecs-18775:~$ curl "http://localhost:9200/vidaa/employee/_mapping?pretty" { "vidaa" : { "mappings" : { "employee" : { "dynamic" : "false", "properties" : { "empname" : { "type" : "keyword" }, "id" : { "type" : "long" }, "targetid" : { "type" : "keyword" } } } } } } es@ecs-18775:~$

2. Create an external table in the Hive, with and script defined to append data when update. CREATE EXTERNAL TABLE ext_employee ( id BIGINT, empName STRING, targetid ARRAY<STRING>) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' tblproperties('es.resource'='vidaa/employee', 'es.mapping.id'='id','es.write.operation'='upsert','es.update.script.params' = 'a_data:targetid','es.update.script.inline'="ctx._source.targetid.add(params.a_data)",'es.nodes' ='localhost', 'es.port' = '9200', 'es.nodes.wan.only' = 'true');

3. Then I attempted to insert values to ext_employee table and expect sync data to elasticsearch index. The SQL statements are: insert into ext_employee values (8, 'Vicky', array('co2')); insert into ext_employee values (9, 'Kevin', array('co3'));

4. I want to store 'co2' and 'co3' into an array in elasticsearch like "targetid":["co2", "co3]. But I always got the error below. Ended Job = job_local1121272017_0005 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-2: HDFS Read: 0 HDFS Write: 0 FAIL

Version Info

OS: : ubuntu 18 JVM : 1.8 Hadoop/Spark: Hive: 3.1.3 ES-Hadoop : elasticsearch-hadoop-8.7.0.jar ES : 6.1.4

Feature description

jbaiera commented 1 year ago

Sorry for the late response here - Are you able to obtain more information from the failed tasks?