elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
1.93k stars 988 forks source link

ES Hadoop Library doesn't support fields with commas in them #1380

Open valguz opened 4 years ago

valguz commented 4 years ago

What kind an issue is this?

Issue description

Description: ES supports fields with commas in them, however, the Hadoop library doesn't seem to support this.

Steps to reproduce

Code:

On ES, create an index:

PUT test_index_3
{
"mappings": {
"properties": {
"attributes": {
"properties": {
"some column with a, comma and then some": {
"type": "keyword"
}
}
}
}
}
}

Put some data in it:

PUT test_index_3/_doc/1
{
"attributes": {
"some column with a, comma and then some": "sdfdsf"
}
}

Create a new program (this one is in scala):

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession

object Test {

def main(args: Array[String]) {

val sparkConf: SparkConf = new SparkConf().setMaster("local").setAppName("Test")
val sc: SparkContext = SparkContext.getOrCreate(sparkConf)
val spark: SparkSession = SparkSession.builder.getOrCreate()

val options = Map(
"es.port" -> "9200",
"es.nodes" -> "elastic-local",
"es.net.ssl" -> "false",
"es.nodes.wan.only" -> "true",
"es.net.http.auth.user" -> "elastic",
"es.net.http.auth.pass" -> "whatever",
"es.write.rest.error.handlers" -> "log",
"es.write.rest.error.handler.log.logger.name" -> "BulkErrors",
"es.read.metadata" -> "true",
"es.scroll.size" -> "10000"
)

val df = spark.read.format("org.elasticsearch.spark.sql")
.options(options)
.load("test_index_3/_doc")
.select("attributes.`some column with a, comma and then some`")
.limit(5)

df.printSchema()
df.show(100, false)

}

}

Error results as follows:

Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency

Strack traces:

/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/bin/java "-javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=57407:/Applications/IntelliJ IDEA CE.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/lib/tools.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/target/scala-2.11/classes:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.scala-lang/scala-library/scala-library-2.11.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/aopalliance/aopalliance/aopalliance-1.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.carrotsearch/hppc/hppc-0.7.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.clearspring.analytics/stream/stream-2.7.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.databricks/spark-xml_2.11/spark-xml_2.11-0.5.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.esotericsoftware/kryo-shaded/kryo-shaded-4.0.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.esotericsoftware/minlog/minlog-1.3.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.fasterxml.jackson.core/jackson-annotations/jackson-annotations-2.9.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.fasterxml.jackson.core/jackson-core/jackson-core-2.9.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.fasterxml.jackson.core/jackson-databind/jackson-databind-2.9.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.fasterxml.jackson.module/jackson-module-paranamer/jackson-module-paranamer-2.9.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.fasterxml.jackson.module/jackson-module-scala_2.11/jackson-module-scala_2.11-2.9.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.github.luben/zstd-jni/zstd-jni-1.3.2-2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.google.code.findbugs/jsr305/jsr305-3.0.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.google.code.gson/gson/gson-2.2.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.google.guava/guava/guava-11.0.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.google.inject/guice/guice-3.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.microsoft.sqlserver/mssql-jdbc/mssql-jdbc-7.2.1.jre8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.ning/compress-lzf/compress-lzf-1.0.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.sksamuel.elastic4s/elastic4s-client-esjava_2.11/elastic4s-client-esjava_2.11-7.1.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.sksamuel.elastic4s/elastic4s-core_2.11/elastic4s-core_2.11-7.1.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.sksamuel.exts/exts_2.11/exts_2.11-1.61.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.thoughtworks.paranamer/paranamer/paranamer-2.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.twitter/chill-java/chill-java-0.9.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.twitter/chill_2.11/chill_2.11-0.9.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/com.typesafe/config/config-1.3.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.univocity/univocity-parsers/univocity-parsers-2.7.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/com.vlkan/flatbuffers/flatbuffers-1.2.0-3f79e055.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-beanutils/commons-beanutils/commons-beanutils-1.7.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-cli/commons-cli/commons-cli-1.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-codec/commons-codec/commons-codec-1.11.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-collections/commons-collections/commons-collections-3.2.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-configuration/commons-configuration/commons-configuration-1.6.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-digester/commons-digester/commons-digester-1.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-httpclient/commons-httpclient/commons-httpclient-3.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-io/commons-io/commons-io-2.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-lang/commons-lang/commons-lang-2.6.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-logging/commons-logging/commons-logging-1.1.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/commons-net/commons-net/commons-net-3.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/io.airlift/aircompressor/aircompressor-0.10.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/io.dropwizard.metrics/metrics-core/metrics-core-3.1.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/io.dropwizard.metrics/metrics-graphite/metrics-graphite-3.1.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/io.dropwizard.metrics/metrics-json/metrics-json-3.1.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/io.dropwizard.metrics/metrics-jvm/metrics-jvm-3.1.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/io.netty/netty/netty-3.9.9.Final.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/io.netty/netty-all/netty-all-4.1.17.Final.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.activation/activation/activation-1.1.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.annotation/javax.annotation-api/javax.annotation-api-1.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.inject/javax.inject/javax.inject-1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.servlet/javax.servlet-api/javax.servlet-api-3.1.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.validation/validation-api/validation-api-1.1.0.Final.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.ws.rs/javax.ws.rs-api/javax.ws.rs-api-2.0.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.xml.bind/jaxb-api/jaxb-api-2.2.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/javax.xml.stream/stax-api/stax-api-1.0-2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/jline/jline/jline-0.9.94.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/joda-time/joda-time/joda-time-2.10.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/junit/junit/junit-3.8.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/log4j/log4j/log4j-1.2.17.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/net.razorvine/pyrolite/pyrolite-4.13.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/net.sf.py4j/py4j/py4j-0.10.7.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.antlr/antlr4-runtime/antlr4-runtime-4.7.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.arrow/arrow-format/arrow-format-0.10.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.arrow/arrow-memory/arrow-memory-0.10.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.arrow/arrow-vector/arrow-vector-0.10.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.avro/avro/avro-1.8.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.avro/avro-ipc/avro-ipc-1.8.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.avro/avro-mapred/avro-mapred-1.8.2-hadoop2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.commons/commons-compress/commons-compress-1.8.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.commons/commons-crypto/commons-crypto-1.0.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.commons/commons-lang3/commons-lang3-3.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.commons/commons-math3/commons-math3-3.4.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.curator/curator-client/curator-client-2.6.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.curator/curator-framework/curator-framework-2.6.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.curator/curator-recipes/curator-recipes-2.6.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.directory.api/api-asn1-api/api-asn1-api-1.0.0-M20.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.directory.api/api-util/api-util-1.0.0-M20.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.directory.server/apacheds-i18n/apacheds-i18n-2.0.0-M15.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.directory.server/apacheds-kerberos-codec/apacheds-kerberos-codec-2.0.0-M15.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-annotations/hadoop-annotations-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-auth/hadoop-auth-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-client/hadoop-client-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-common/hadoop-common-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-hdfs/hadoop-hdfs-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-mapreduce-client-app/hadoop-mapreduce-client-app-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-mapreduce-client-common/hadoop-mapreduce-client-common-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-mapreduce-client-core/hadoop-mapreduce-client-core-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-mapreduce-client-jobclient/hadoop-mapreduce-client-jobclient-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-mapreduce-client-shuffle/hadoop-mapreduce-client-shuffle-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-yarn-api/hadoop-yarn-api-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-yarn-client/hadoop-yarn-client-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-yarn-common/hadoop-yarn-common-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.hadoop/hadoop-yarn-server-common/hadoop-yarn-server-common-2.6.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.httpcomponents/httpasyncclient/httpasyncclient-4.1.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.httpcomponents/httpclient/httpclient-4.5.7.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.httpcomponents/httpcore/httpcore-4.4.11.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.httpcomponents/httpcore-nio/httpcore-nio-4.4.11.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.ivy/ivy/ivy-2.4.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.orc/orc-core/orc-core-1.5.5-nohive.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.orc/orc-mapreduce/orc-mapreduce-1.5.5-nohive.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.orc/orc-shims/orc-shims-1.5.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.parquet/parquet-column/parquet-column-1.10.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.parquet/parquet-common/parquet-common-1.10.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.parquet/parquet-encoding/parquet-encoding-1.10.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.parquet/parquet-format/parquet-format-2.4.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.parquet/parquet-hadoop/parquet-hadoop-1.10.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.parquet/parquet-jackson/parquet-jackson-1.10.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-catalyst_2.11/spark-catalyst_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-core_2.11/spark-core_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-kvstore_2.11/spark-kvstore_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-launcher_2.11/spark-launcher_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-network-common_2.11/spark-network-common_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-network-shuffle_2.11/spark-network-shuffle_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-sketch_2.11/spark-sketch_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-sql_2.11/spark-sql_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-tags_2.11/spark-tags_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.spark/spark-unsafe_2.11/spark-unsafe_2.11-2.4.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.apache.xbean/xbean-asm6-shaded/xbean-asm6-shaded-4.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.apache.zookeeper/zookeeper/zookeeper-3.4.6.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.codehaus.jackson/jackson-core-asl/jackson-core-asl-1.9.13.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.codehaus.jackson/jackson-jaxrs/jackson-jaxrs-1.9.13.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.codehaus.jackson/jackson-mapper-asl/jackson-mapper-asl-1.9.13.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.codehaus.jackson/jackson-xc/jackson-xc-1.9.13.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.codehaus.janino/commons-compiler/commons-compiler-3.0.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.codehaus.janino/janino/janino-3.0.9.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.codehaus.jettison/jettison/jettison-1.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.elasticsearch/elasticsearch-spark-20_2.11/elasticsearch-spark-20_2.11-7.3.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.elasticsearch.client/elasticsearch-rest-client/elasticsearch-rest-client-7.1.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.fusesource.leveldbjni/leveldbjni-all/leveldbjni-all-1.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.hk2/hk2-api/hk2-api-2.4.0-b34.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.hk2/hk2-locator/hk2-locator-2.4.0-b34.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.hk2/hk2-utils/hk2-utils-2.4.0-b34.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.hk2/osgi-resource-locator/osgi-resource-locator-1.0.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.hk2.external/aopalliance-repackaged/aopalliance-repackaged-2.4.0-b34.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.hk2.external/javax.inject/javax.inject-2.4.0-b34.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.glassfish.jersey.bundles.repackaged/jersey-guava/jersey-guava-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.jersey.containers/jersey-container-servlet/jersey-container-servlet-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.jersey.containers/jersey-container-servlet-core/jersey-container-servlet-core-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.jersey.core/jersey-client/jersey-client-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.jersey.core/jersey-common/jersey-common-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.jersey.core/jersey-server/jersey-server-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.glassfish.jersey.media/jersey-media-jaxb/jersey-media-jaxb-2.22.2.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.htrace/htrace-core/htrace-core-3.0.4.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.javassist/javassist/javassist-3.18.1-GA.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.json4s/json4s-ast_2.11/json4s-ast_2.11-3.5.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.json4s/json4s-core_2.11/json4s-core_2.11-3.5.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.json4s/json4s-jackson_2.11/json4s-jackson_2.11-3.5.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.json4s/json4s-scalap_2.11/json4s-scalap_2.11-3.5.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.lz4/lz4-java/lz4-java-1.4.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.mortbay.jetty/jetty-util/jetty-util-6.1.26.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.objenesis/objenesis/objenesis-2.5.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.roaringbitmap/RoaringBitmap/RoaringBitmap-0.7.45.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.roaringbitmap/shims/shims-0.7.45.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.scala-lang/scala-reflect/scala-reflect-2.11.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.scala-lang.modules/scala-parser-combinators_2.11/scala-parser-combinators_2.11-1.1.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.scala-lang.modules/scala-xml_2.11/scala-xml_2.11-1.0.6.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.slf4j/jcl-over-slf4j/jcl-over-slf4j-1.7.16.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.slf4j/jul-to-slf4j/jul-to-slf4j-1.7.16.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.slf4j/slf4j-api/slf4j-api-1.7.26.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.slf4j/slf4j-log4j12/slf4j-log4j12-1.7.16.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.sonatype.sisu.inject/cglib/cglib-2.2.1-v20090111.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.spark-project.spark/unused/unused-1.0.0.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/org.tukaani/xz/xz-1.5.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/bundles/org.xerial.snappy/snappy-java/snappy-java-1.1.7.3.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/oro/oro/oro-2.0.8.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/xerces/xercesImpl/xercesImpl-2.9.1.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/xml-apis/xml-apis/xml-apis-1.3.04.jar:/Users/a4rbwzz/IdeaProjects/LocalTesting/lib_managed/jars/xmlenc/xmlenc/xmlenc-0.52.jar mmm.hamr.testing.Test
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/10/22 13:02:48 INFO SparkContext: Running Spark version 2.4.4
19/10/22 13:02:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/22 13:02:48 INFO SparkContext: Submitted application: Test
19/10/22 13:02:49 INFO SecurityManager: Changing view acls to: a4rbwzz
19/10/22 13:02:49 INFO SecurityManager: Changing modify acls to: a4rbwzz
19/10/22 13:02:49 INFO SecurityManager: Changing view acls groups to: 
19/10/22 13:02:49 INFO SecurityManager: Changing modify acls groups to: 
19/10/22 13:02:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(a4rbwzz); groups with view permissions: Set(); users  with modify permissions: Set(a4rbwzz); groups with modify permissions: Set()
19/10/22 13:02:49 INFO Utils: Successfully started service 'sparkDriver' on port 57420.
19/10/22 13:02:49 INFO SparkEnv: Registering MapOutputTracker
19/10/22 13:02:49 INFO SparkEnv: Registering BlockManagerMaster
19/10/22 13:02:49 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/10/22 13:02:49 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/10/22 13:02:49 INFO DiskBlockManager: Created local directory at /private/var/folders/fr/6mz9w8z55cl8nxj50sgyrzzjv0tlx7/T/blockmgr-c91e54ea-8a01-4ed2-b68b-7bcca216d1ad
19/10/22 13:02:49 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
19/10/22 13:02:49 INFO SparkEnv: Registering OutputCommitCoordinator
19/10/22 13:02:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/10/22 13:02:49 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://3mc02sw9t9gtf1.mmm.com:4040
19/10/22 13:02:50 INFO Executor: Starting executor ID driver on host localhost
19/10/22 13:02:50 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57421.
19/10/22 13:02:50 INFO NettyBlockTransferService: Server created on 3mc02sw9t9gtf1.mmm.com:57421
19/10/22 13:02:50 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/10/22 13:02:50 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 3mc02sw9t9gtf1.mmm.com, 57421, None)
19/10/22 13:02:50 INFO BlockManagerMasterEndpoint: Registering block manager 3mc02sw9t9gtf1.mmm.com:57421 with 2004.6 MB RAM, BlockManagerId(driver, 3mc02sw9t9gtf1.mmm.com, 57421, None)
19/10/22 13:02:50 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 3mc02sw9t9gtf1.mmm.com, 57421, None)
19/10/22 13:02:50 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 3mc02sw9t9gtf1.mmm.com, 57421, None)
19/10/22 13:02:50 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
19/10/22 13:02:50 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/a4rbwzz/IdeaProjects/LocalTesting/spark-warehouse').
19/10/22 13:02:50 INFO SharedState: Warehouse path is 'file:/Users/a4rbwzz/IdeaProjects/LocalTesting/spark-warehouse'.
19/10/22 13:02:51 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/10/22 13:02:51 INFO Version: Elasticsearch Hadoop v7.3.1 [193e7103ee]
19/10/22 13:02:51 WARN Resource: Detected type name in resource [test_index_3/_doc]. Type names are deprecated and will be removed in a later release.
root
 |-- some column with a, comma and then some: string (nullable = true)

19/10/22 13:02:54 INFO CodeGenerator: Code generated in 294.010661 ms
19/10/22 13:02:54 INFO CodeGenerator: Code generated in 21.985345 ms
19/10/22 13:02:54 WARN Resource: Detected type name in resource [test_index_3/_doc]. Type names are deprecated and will be removed in a later release.
19/10/22 13:02:54 WARN Resource: Detected type name in resource [test_index_3/_doc]. Type names are deprecated and will be removed in a later release.
19/10/22 13:02:54 WARN Resource: Detected type name in resource [test_index_3/_doc]. Type names are deprecated and will be removed in a later release.
19/10/22 13:02:54 INFO ScalaEsRowRDD: Reading from [test_index_3/_doc]
19/10/22 13:02:54 INFO SparkContext: Starting job: show at Test.scala:36
19/10/22 13:02:54 INFO DAGScheduler: Got job 0 (show at Test.scala:36) with 1 output partitions
19/10/22 13:02:54 INFO DAGScheduler: Final stage: ResultStage 0 (show at Test.scala:36)
19/10/22 13:02:54 INFO DAGScheduler: Parents of final stage: List()
19/10/22 13:02:54 INFO DAGScheduler: Missing parents: List()
19/10/22 13:02:54 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at show at Test.scala:36), which has no missing parents
19/10/22 13:02:54 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 8.2 KB, free 2004.6 MB)
19/10/22 13:02:54 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.4 KB, free 2004.6 MB)
19/10/22 13:02:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 3mc02sw9t9gtf1.mmm.com:57421 (size: 4.4 KB, free: 2004.6 MB)
19/10/22 13:02:54 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/10/22 13:02:54 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at Test.scala:36) (first 15 tasks are for partitions Vector(0))
19/10/22 13:02:54 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
19/10/22 13:02:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 10343 bytes)
19/10/22 13:02:54 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
19/10/22 13:02:54 WARN Resource: Detected type name in resource [test_index_3/_doc]. Type names are deprecated and will be removed in a later release.
19/10/22 13:02:54 WARN Resource: Detected type name in resource [test_index_3/_doc]. Type names are deprecated and will be removed in a later release.
19/10/22 13:02:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.elasticsearch.hadoop.rest.EsHadoopParsingException: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:514)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:60)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:32)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:118)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)
    ... 27 more
19/10/22 13:02:54 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopParsingException: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:514)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:60)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:32)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:118)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)
    ... 27 more

19/10/22 13:02:54 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
19/10/22 13:02:54 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
19/10/22 13:02:54 INFO TaskSchedulerImpl: Cancelling stage 0
19/10/22 13:02:54 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
19/10/22 13:02:54 INFO DAGScheduler: ResultStage 0 (show at Test.scala:36) failed in 0.413 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopParsingException: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:514)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:60)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:32)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:118)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)
    ... 27 more

Driver stacktrace:
19/10/22 13:02:54 INFO DAGScheduler: Job 0 failed: show at Test.scala:36, took 0.468521 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopParsingException: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:514)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:60)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:32)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:118)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)
    ... 27 more

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:365)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2550)
    at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)
    at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)
    at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
    at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:753)
    at mmm.hamr.testing.Test$.main(Test.scala:36)
    at mmm.hamr.testing.Test.main(Test.scala)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopParsingException: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:514)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'attributes.some column with a, comma and then some' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:60)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:32)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:118)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:1047)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:889)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:602)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:426)
    ... 27 more
19/10/22 13:02:54 INFO SparkContext: Invoking stop() from shutdown hook
19/10/22 13:02:54 INFO SparkUI: Stopped Spark web UI at http://3mc02sw9t9gtf1.mmm.com:4040
19/10/22 13:02:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/10/22 13:02:55 INFO MemoryStore: MemoryStore cleared
19/10/22 13:02:55 INFO BlockManager: BlockManager stopped
19/10/22 13:02:55 INFO BlockManagerMaster: BlockManagerMaster stopped
19/10/22 13:02:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/10/22 13:02:55 INFO SparkContext: Successfully stopped SparkContext
19/10/22 13:02:55 INFO ShutdownHookManager: Shutdown hook called
19/10/22 13:02:55 INFO ShutdownHookManager: Deleting directory /private/var/folders/fr/6mz9w8z55cl8nxj50sgyrzzjv0tlx7/T/spark-2cf26843-c80d-4d34-9093-bb9893d1b815

Process finished with exit code 1
jbaiera commented 4 years ago

This is really a 2-parter: Fixing the immediate issue and also making sure that we test with punctuation in field names across the entire project.

tegansnyder commented 4 years ago

Looking at https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/util/StringUtils.java#L42 and where the concatenation is done here https://github.com/elastic/elasticsearch-hadoop/blob/master/spark/sql-20/src/main/scala/org/elasticsearch/spark/sql/SchemaUtils.scala#L295 it looks like from an outsiders perspective maybe representing the data internally as something other than a string is the way to go. We did a very hacky workaround by forking this library and the remove the final declaration of delimiter variable... so we could configure it in app to something like a record separator org.elasticsearch.hadoop.util.StringUtils.DEFAULT_DELIMITER = "\u001e". That requires us to have to define that seperator in other places... for instance es.read.field.exclude becomes:

.option(es.read.field.exclude", "some_field_here\u001e other_field_here")

I look forward to following this issue and see what the ES Hadoop team comes up. Thanks for posting this @valguz

masseyke commented 2 years ago

I've got a draft PR up that fixes this one, but we're a little worried that it might introduce unexpected new problems. Is anyone still running into this? I'm not sure how common it is to use commas in field names (I had never seen it before coming across this ticket, and would have guessed that Elasticsearch did not allow it).