elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
10 stars 989 forks source link

Warning: Local jar org.apache.curator_apache-curator-2.6.0.jar does not exist #895

Closed aeneaswiener closed 7 years ago

aeneaswiener commented 8 years ago

What kind an issue is this?

Issue description

When installing as a Spark package I am getting a warning because of a missing Jar file (org.apache.curator_apache-curator-2.6.0.jar). The PySpark driver output contains an error mentioning a missing jar. This is preventing the PySpark shell to start up successfully.

Steps to reproduce

  1. Start a Google Cloud Dataproc cluster via
gcloud beta dataproc clusters create my_test_cluster_01
  1. After logging into the master node I am starting the PySpark shell as follows, which requests elasticsearch-hadoop:2.4.0 to be fetched from Maven central as a package:
$ pyspark --packages org.elasticsearch:elasticsearch-hadoop:2.4.0 --repositories http://conjars.org/repo,https://clojars.org/repo
  1. The PySpark driver output contains the following error mentioning a missing jar file (org.apache.curator_apache-curator-2.6.0.jar):
        ring#ring-core;0.3.11 by [ring#ring-core;1.1.5] in [default]
        commons-lang#commons-lang;2.5 by [commons-lang#commons-lang;2.6] in [default]
        org.slf4j#slf4j-api;1.6.6 by [org.slf4j#slf4j-api;1.7.5] in [default]
        jline#jline;2.11 by [jline#jline;2.12] in [default]
        commons-collections#commons-collections;3.1 by [commons-collections#commons-collections;3.2.1] in [default]
        commons-httpclient#commons-httpclient;3.0.1 by [commons-httpclient#commons-httpclient;3.1] in [default]
        asm#asm;3.1 by [asm#asm;3.2] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |  193  |   0   |   0   |   27  ||  166  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        0 artifacts copied, 166 already retrieved (0kB/64ms)
Warning: Local jar /home/aeneas.wiener/.ivy2/jars/org.apache.curator_apache-curator-2.6.0.jar does not exist, skipping.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/16 09:49:09 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/home/aeneas.wiener/.ivy2/jars/org.apache.curator_apache-curator-2.6.0.jar does not ex
ist
  1. This is preventing the PySpark shell to start up successfully, see the full stack trace / driver output below.

Version Info

OS:         :  Debian GNU/Linux, started via Google Cloud Dataproc
JVM         :  openjdk version "1.8.0_111"
Hadoop/Spark :  Apache Spark 2.0.1, Apache Hadoop 2.7.3
ES-Hadoop   :  2.4.0
ES          :  2.2.1"

Full stack trace:

$ pyspark --packages org.elasticsearch:elasticsearch-hadoop:2.4.0 --repositories http://conjars.org/repo,https://clojars.org/repo
Python 2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /home/aeneas.wiener/.ivy2/cache
The jars for the packages stored in: /home/aeneas.wiener/.ivy2/jars
http://conjars.org/repo added as a remote repository with the name: repo-1
https://clojars.org/repo added as a remote repository with the name: repo-2
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.elasticsearch#elasticsearch-hadoop added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found org.elasticsearch#elasticsearch-hadoop;2.4.0 in central
    found org.apache.pig#pig;0.15.0 in central
    found commons-cli#commons-cli;1.2 in central
    found xmlenc#xmlenc;0.52 in central
    found commons-httpclient#commons-httpclient;3.1 in central
    found commons-logging#commons-logging;1.0.4 in central
    found commons-codec#commons-codec;1.4 in central
    found commons-net#commons-net;1.4.1 in central
    found oro#oro;2.0.8 in central
    found org.mortbay.jetty#jetty;6.1.26 in central
    found org.mortbay.jetty#jetty-util;6.1.26 in central
    found org.mortbay.jetty#servlet-api;2.5-20081211 in central
    found tomcat#jasper-runtime;5.5.12 in central
    found tomcat#jasper-compiler;5.5.12 in central
    found org.mortbay.jetty#jsp-api-2.1;6.1.14 in central
    found org.mortbay.jetty#servlet-api-2.5;6.1.14 in central
    found org.mortbay.jetty#jsp-2.1;6.1.14 in central
    found org.eclipse.jdt#core;3.1.1 in central
    found ant#ant;1.6.5 in central
    found commons-el#commons-el;1.0 in central
    found net.java.dev.jets3t#jets3t;0.7.1 in repo-2
    found commons-logging#commons-logging;1.1.1 in central
    found net.sf.kosmosfs#kfs;0.3 in central
    found junit#junit;4.8.1 in repo-2
    found hsqldb#hsqldb;1.8.0.10 in central
    found jline#jline;1.0 in central
    found org.antlr#ST4;4.0.4 in central
    found org.antlr#antlr-runtime;3.4 in central
    found org.antlr#stringtemplate;3.2.1 in central
    found antlr#antlr;2.7.7 in central
    found dk.brics.automaton#automaton;1.11-8 in central
    found com.google.guava#guava;11.0 in central
    found com.google.code.findbugs#jsr305;1.3.9 in central
    found org.codehaus.jackson#jackson-mapper-asl;1.8.8 in central
    found org.codehaus.jackson#jackson-core-asl;1.8.8 in central
    found joda-time#joda-time;2.1 in central
    found org.apache.avro#avro;1.7.4 in central
    found com.thoughtworks.paranamer#paranamer;2.3 in central
    found org.xerial.snappy#snappy-java;1.0.4.1 in central
    found org.apache.commons#commons-compress;1.4.1 in central
    found org.tukaani#xz;1.0 in central
    found org.slf4j#slf4j-api;1.6.4 in central
    found cascading#cascading-local;2.6.3 in repo-1
    found cascading#cascading-core;2.6.3 in repo-1
    found riffle#riffle;0.1-dev in repo-1
    found thirdparty#jgrapht-jdk1.6;0.8.1 in repo-1
    found org.codehaus.janino#janino;2.7.5 in central
    found org.codehaus.janino#commons-compiler;2.7.5 in central
    found org.slf4j#slf4j-api;1.7.2 in central
    found com.google.guava#guava;14.0.1 in central
    found org.apache.storm#storm-core;0.9.6 in central
    found org.clojure#clojure;1.5.1 in central
    found clj-time#clj-time;0.4.1 in repo-2
    found compojure#compojure;1.1.3 in repo-2
    found org.clojure#core.incubator;0.1.0 in central
    found org.clojure#tools.macro;0.1.0 in central
    found clout#clout;1.0.1 in repo-2
    found ring#ring-core;1.1.5 in repo-2
    found commons-codec#commons-codec;1.6 in central
    found commons-io#commons-io;2.4 in central
    found commons-fileupload#commons-fileupload;1.2.1 in central
    found javax.servlet#servlet-api;2.5 in central
    found hiccup#hiccup;0.3.6 in repo-2
    found ring#ring-devel;0.3.11 in repo-2
    found clj-stacktrace#clj-stacktrace;0.2.2 in repo-2
    found ring#ring-jetty-adapter;0.3.11 in repo-2
    found ring#ring-servlet;0.3.11 in repo-2
    found org.clojure#tools.logging;0.2.3 in central
    found org.clojure#math.numeric-tower;0.0.1 in central
    found org.clojure#tools.cli;0.2.4 in central
    found org.apache.commons#commons-exec;1.1 in central
    found commons-lang#commons-lang;2.5 in central
    found com.googlecode.json-simple#json-simple;1.1 in central
    found com.twitter#carbonite;1.4.0 in repo-2
    found com.esotericsoftware.kryo#kryo;2.21 in central
    found com.esotericsoftware.reflectasm#reflectasm;1.07 in central
    found org.ow2.asm#asm;4.0 in central
    found com.esotericsoftware.minlog#minlog;1.2 in central
    found org.objenesis#objenesis;1.2 in central
    found com.twitter#chill-java;0.3.5 in central
    found org.yaml#snakeyaml;1.11 in central
    found commons-logging#commons-logging;1.1.3 in central
    found com.googlecode.disruptor#disruptor;2.10.4 in central
    found org.jgrapht#jgrapht-core;0.9.0 in central
    found ch.qos.logback#logback-classic;1.0.13 in central
    found ch.qos.logback#logback-core;1.0.13 in central
    found org.slf4j#slf4j-api;1.7.5 in central
    found org.slf4j#log4j-over-slf4j;1.6.6 in central
    found jline#jline;2.11 in central
    found cascading#cascading-hadoop;2.6.3 in repo-1
    found org.apache.hive#hive-service;1.2.1 in central
    found org.apache.hive#hive-exec;1.2.1 in central
    found org.apache.hive#hive-ant;1.2.1 in central
    found commons-lang#commons-lang;2.6 in central
    found org.apache.ant#ant;1.9.1 in central
    found org.apache.ant#ant-launcher;1.9.1 in central
    found org.apache.velocity#velocity;1.5 in central
    found commons-collections#commons-collections;3.1 in central
    found org.slf4j#slf4j-log4j12;1.7.5 in central
    found log4j#log4j;1.2.16 in central
    found org.apache.hive#hive-metastore;1.2.1 in central
    found org.apache.hive#hive-serde;1.2.1 in central
    found org.apache.hive#hive-common;1.2.1 in central
    found org.apache.hive#hive-shims;1.2.1 in central
    found org.apache.hive.shims#hive-shims-common;1.2.1 in central
    found log4j#apache-log4j-extras;1.2.17 in central
    found org.apache.thrift#libthrift;0.9.2 in central
    found org.apache.httpcomponents#httpclient;4.4 in central
    found org.apache.httpcomponents#httpcore;4.4 in central
    found org.apache.curator#curator-framework;2.6.0 in central
    found org.apache.curator#curator-client;2.6.0 in central
    found org.apache.zookeeper#zookeeper;3.4.6 in central
    found jline#jline;2.12 in central
    found io.netty#netty;3.7.0.Final in central
    found joda-time#joda-time;2.5 in central
    found org.json#json;20090211 in central
    found com.google.code.findbugs#jsr305;3.0.0 in central
    found org.apache.avro#avro;1.7.5 in central
    found org.codehaus.jackson#jackson-core-asl;1.9.2 in central
    found org.codehaus.jackson#jackson-mapper-asl;1.9.2 in central
    found org.xerial.snappy#snappy-java;1.0.5 in central
    found net.sf.opencsv#opencsv;2.3 in central
    found com.twitter#parquet-hadoop-bundle;1.6.0 in central
    found com.jolbox#bonecp;0.8.0.RELEASE in central
    found org.apache.derby#derby;10.10.2.0 in central
    found org.datanucleus#datanucleus-api-jdo;3.2.6 in central
    found org.datanucleus#datanucleus-core;3.2.10 in central
    found org.datanucleus#datanucleus-rdbms;3.2.9 in central
    found commons-pool#commons-pool;1.5.4 in repo-2
    found commons-dbcp#commons-dbcp;1.4 in central
    found javax.jdo#jdo-api;3.0.1 in central
    found javax.transaction#jta;1.1 in central
    found org.apache.thrift#libfb303;0.9.2 in central
    found org.apache.ivy#ivy;2.4.0 in central
    found org.apache.curator#apache-curator;2.6.0 in central
    found org.codehaus.groovy#groovy-all;2.1.6 in central
    found org.apache.calcite#calcite-core;1.2.0-incubating in central
    found org.apache.calcite#calcite-avatica;1.2.0-incubating in central
    found org.apache.calcite#calcite-linq4j;1.2.0-incubating in central
    found net.hydromatic#eigenbase-properties;1.1.5 in central
    found org.codehaus.janino#janino;2.7.6 in central
    found org.codehaus.janino#commons-compiler;2.7.6 in central
    found org.pentaho#pentaho-aggdesigner-algorithm;5.1.5-jhyde in repo-1
    found stax#stax-api;1.0.1 in central
    found net.sf.jpam#jpam;1.1 in central
    found org.eclipse.jetty.aggregate#jetty-all;7.6.0.v20120127 in central
    found org.apache.geronimo.specs#geronimo-jta_1.1_spec;1.1.1 in central
    found javax.mail#mail;1.4.1 in central
    found javax.activation#activation;1.1 in central
    found org.apache.geronimo.specs#geronimo-jaspic_1.0_spec;1.0 in central
    found org.apache.geronimo.specs#geronimo-annotation_1.0_spec;1.1.1 in central
    found asm#asm-commons;3.1 in central
    found asm#asm-tree;3.1 in central
    found asm#asm;3.1 in central
    found org.apache.curator#curator-recipes;2.6.0 in central
    found org.apache.hive.shims#hive-shims-0.20S;1.2.1 in central
    found org.apache.hive.shims#hive-shims-0.23;1.2.1 in central
    found org.apache.hadoop#hadoop-yarn-server-resourcemanager;2.6.0 in central
    found org.apache.hadoop#hadoop-annotations;2.6.0 in central
    found com.google.inject.extensions#guice-servlet;3.0 in central
    found com.google.inject#guice;3.0 in central
    found javax.inject#javax.inject;1 in central
    found aopalliance#aopalliance;1.0 in central
    found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central
    found asm#asm;3.2 in central
    found com.google.protobuf#protobuf-java;2.5.0 in central
    found com.sun.jersey#jersey-json;1.14 in central
    found org.codehaus.jettison#jettison;1.1 in central
    found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
    found javax.xml.bind#jaxb-api;2.2.2 in central
    found javax.xml.stream#stax-api;1.0-2 in central
    found org.codehaus.jackson#jackson-jaxrs;1.9.2 in central
    found org.codehaus.jackson#jackson-xc;1.9.2 in central
    found com.sun.jersey.contribs#jersey-guice;1.9 in central
    found org.apache.hadoop#hadoop-yarn-common;2.6.0 in central
    found org.apache.hadoop#hadoop-yarn-api;2.6.0 in central
    found com.sun.jersey#jersey-core;1.14 in central
    found com.sun.jersey#jersey-client;1.9 in central
    found com.sun.jersey#jersey-server;1.14 in central
    found org.apache.hadoop#hadoop-yarn-server-common;2.6.0 in central
    found org.fusesource.leveldbjni#leveldbjni-all;1.8 in central
    found org.apache.hadoop#hadoop-yarn-server-applicationhistoryservice;2.6.0 in central
    found commons-collections#commons-collections;3.2.1 in central
    found org.apache.hadoop#hadoop-yarn-server-web-proxy;2.6.0 in central
    found org.apache.hive.shims#hive-shims-scheduler;1.2.1 in central
:: resolution report :: resolve 4111ms :: artifacts dl 88ms
    :: modules in use:
    ant#ant;1.6.5 from central in [default]
    antlr#antlr;2.7.7 from central in [default]
    aopalliance#aopalliance;1.0 from central in [default]
    asm#asm;3.2 from central in [default]
    asm#asm-commons;3.1 from central in [default]
    asm#asm-tree;3.1 from central in [default]
    cascading#cascading-core;2.6.3 from repo-1 in [default]
    cascading#cascading-hadoop;2.6.3 from repo-1 in [default]
    cascading#cascading-local;2.6.3 from repo-1 in [default]
    ch.qos.logback#logback-classic;1.0.13 from central in [default]
    ch.qos.logback#logback-core;1.0.13 from central in [default]
    clj-stacktrace#clj-stacktrace;0.2.2 from repo-2 in [default]
    clj-time#clj-time;0.4.1 from repo-2 in [default]
    clout#clout;1.0.1 from repo-2 in [default]
    com.esotericsoftware.kryo#kryo;2.21 from central in [default]
    com.esotericsoftware.minlog#minlog;1.2 from central in [default]
    com.esotericsoftware.reflectasm#reflectasm;1.07 from central in [default]
    com.google.code.findbugs#jsr305;3.0.0 from central in [default]
    com.google.guava#guava;14.0.1 from central in [default]
    com.google.inject#guice;3.0 from central in [default]
    com.google.inject.extensions#guice-servlet;3.0 from central in [default]
    com.google.protobuf#protobuf-java;2.5.0 from central in [default]
    com.googlecode.disruptor#disruptor;2.10.4 from central in [default]
    com.googlecode.json-simple#json-simple;1.1 from central in [default]
    com.jolbox#bonecp;0.8.0.RELEASE from central in [default]
    com.sun.jersey#jersey-client;1.9 from central in [default]
    com.sun.jersey#jersey-core;1.14 from central in [default]
    com.sun.jersey#jersey-json;1.14 from central in [default]
    com.sun.jersey#jersey-server;1.14 from central in [default]
    com.sun.jersey.contribs#jersey-guice;1.9 from central in [default]
    com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
    com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
    com.twitter#carbonite;1.4.0 from repo-2 in [default]
    com.twitter#chill-java;0.3.5 from central in [default]
    com.twitter#parquet-hadoop-bundle;1.6.0 from central in [default]
    commons-cli#commons-cli;1.2 from central in [default]
    commons-codec#commons-codec;1.6 from central in [default]
    commons-collections#commons-collections;3.2.1 from central in [default]
    commons-dbcp#commons-dbcp;1.4 from central in [default]
    commons-el#commons-el;1.0 from central in [default]
    commons-fileupload#commons-fileupload;1.2.1 from central in [default]
    commons-httpclient#commons-httpclient;3.1 from central in [default]
    commons-io#commons-io;2.4 from central in [default]
    commons-lang#commons-lang;2.6 from central in [default]
    commons-logging#commons-logging;1.1.3 from central in [default]
    commons-net#commons-net;1.4.1 from central in [default]
    commons-pool#commons-pool;1.5.4 from repo-2 in [default]
    compojure#compojure;1.1.3 from repo-2 in [default]
    dk.brics.automaton#automaton;1.11-8 from central in [default]
    hiccup#hiccup;0.3.6 from repo-2 in [default]
    hsqldb#hsqldb;1.8.0.10 from central in [default]
    io.netty#netty;3.7.0.Final from central in [default]
    javax.activation#activation;1.1 from central in [default]
    javax.inject#javax.inject;1 from central in [default]
    javax.jdo#jdo-api;3.0.1 from central in [default]
    javax.mail#mail;1.4.1 from central in [default]
    javax.servlet#servlet-api;2.5 from central in [default]
    javax.transaction#jta;1.1 from central in [default]
    javax.xml.bind#jaxb-api;2.2.2 from central in [default]
    javax.xml.stream#stax-api;1.0-2 from central in [default]
    jline#jline;2.12 from central in [default]
    joda-time#joda-time;2.5 from central in [default]
    junit#junit;4.8.1 from repo-2 in [default]
    log4j#apache-log4j-extras;1.2.17 from central in [default]
    log4j#log4j;1.2.16 from central in [default]
    net.hydromatic#eigenbase-properties;1.1.5 from central in [default]
    net.java.dev.jets3t#jets3t;0.7.1 from repo-2 in [default]
    net.sf.jpam#jpam;1.1 from central in [default]
    net.sf.kosmosfs#kfs;0.3 from central in [default]
    net.sf.opencsv#opencsv;2.3 from central in [default]
    org.antlr#ST4;4.0.4 from central in [default]
    org.antlr#antlr-runtime;3.4 from central in [default]
    org.antlr#stringtemplate;3.2.1 from central in [default]
    org.apache.ant#ant;1.9.1 from central in [default]
    org.apache.ant#ant-launcher;1.9.1 from central in [default]
    org.apache.avro#avro;1.7.5 from central in [default]
    org.apache.calcite#calcite-avatica;1.2.0-incubating from central in [default]
    org.apache.calcite#calcite-core;1.2.0-incubating from central in [default]
    org.apache.calcite#calcite-linq4j;1.2.0-incubating from central in [default]
    org.apache.commons#commons-compress;1.4.1 from central in [default]
    org.apache.commons#commons-exec;1.1 from central in [default]
    org.apache.curator#apache-curator;2.6.0 from central in [default]
    org.apache.curator#curator-client;2.6.0 from central in [default]
    org.apache.curator#curator-framework;2.6.0 from central in [default]
    org.apache.curator#curator-recipes;2.6.0 from central in [default]
    org.apache.derby#derby;10.10.2.0 from central in [default]
    org.apache.geronimo.specs#geronimo-annotation_1.0_spec;1.1.1 from central in [default]
    org.apache.geronimo.specs#geronimo-jaspic_1.0_spec;1.0 from central in [default]
    org.apache.geronimo.specs#geronimo-jta_1.1_spec;1.1.1 from central in [default]
    org.apache.hadoop#hadoop-annotations;2.6.0 from central in [default]
    org.apache.hadoop#hadoop-yarn-api;2.6.0 from central in [default]
    org.apache.hadoop#hadoop-yarn-common;2.6.0 from central in [default]
    org.apache.hadoop#hadoop-yarn-server-applicationhistoryservice;2.6.0 from central in [default]
    org.apache.hadoop#hadoop-yarn-server-common;2.6.0 from central in [default]
    org.apache.hadoop#hadoop-yarn-server-resourcemanager;2.6.0 from central in [default]
    org.apache.hadoop#hadoop-yarn-server-web-proxy;2.6.0 from central in [default]
    org.apache.hive#hive-ant;1.2.1 from central in [default]
    org.apache.hive#hive-common;1.2.1 from central in [default]
    org.apache.hive#hive-exec;1.2.1 from central in [default]
    org.apache.hive#hive-metastore;1.2.1 from central in [default]
    org.apache.hive#hive-serde;1.2.1 from central in [default]
    org.apache.hive#hive-service;1.2.1 from central in [default]
    org.apache.hive#hive-shims;1.2.1 from central in [default]
    org.apache.hive.shims#hive-shims-0.20S;1.2.1 from central in [default]
    org.apache.hive.shims#hive-shims-0.23;1.2.1 from central in [default]
    org.apache.hive.shims#hive-shims-common;1.2.1 from central in [default]
    org.apache.hive.shims#hive-shims-scheduler;1.2.1 from central in [default]
    org.apache.httpcomponents#httpclient;4.4 from central in [default]
    org.apache.httpcomponents#httpcore;4.4 from central in [default]
    org.apache.ivy#ivy;2.4.0 from central in [default]
    org.apache.pig#pig;0.15.0 from central in [default]
    org.apache.storm#storm-core;0.9.6 from central in [default]
    org.apache.thrift#libfb303;0.9.2 from central in [default]
    org.apache.thrift#libthrift;0.9.2 from central in [default]
    org.apache.velocity#velocity;1.5 from central in [default]
    org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
    org.clojure#clojure;1.5.1 from central in [default]
    org.clojure#core.incubator;0.1.0 from central in [default]
    org.clojure#math.numeric-tower;0.0.1 from central in [default]
    org.clojure#tools.cli;0.2.4 from central in [default]
    org.clojure#tools.logging;0.2.3 from central in [default]
    org.clojure#tools.macro;0.1.0 from central in [default]
    org.codehaus.groovy#groovy-all;2.1.6 from central in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.2 from central in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.2 from central in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.2 from central in [default]
    org.codehaus.jackson#jackson-xc;1.9.2 from central in [default]
    org.codehaus.janino#commons-compiler;2.7.6 from central in [default]
    org.codehaus.janino#janino;2.7.6 from central in [default]
    org.codehaus.jettison#jettison;1.1 from central in [default]
    org.datanucleus#datanucleus-api-jdo;3.2.6 from central in [default]
    org.datanucleus#datanucleus-core;3.2.10 from central in [default]
    org.datanucleus#datanucleus-rdbms;3.2.9 from central in [default]
    org.eclipse.jdt#core;3.1.1 from central in [default]
    org.eclipse.jetty.aggregate#jetty-all;7.6.0.v20120127 from central in [default]
    org.elasticsearch#elasticsearch-hadoop;2.4.0 from central in [default]
    org.fusesource.leveldbjni#leveldbjni-all;1.8 from central in [default]
    org.jgrapht#jgrapht-core;0.9.0 from central in [default]
    org.json#json;20090211 from central in [default]
    org.mortbay.jetty#jetty;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
    org.mortbay.jetty#jsp-2.1;6.1.14 from central in [default]
    org.mortbay.jetty#jsp-api-2.1;6.1.14 from central in [default]
    org.mortbay.jetty#servlet-api;2.5-20081211 from central in [default]
    org.mortbay.jetty#servlet-api-2.5;6.1.14 from central in [default]
    org.objenesis#objenesis;1.2 from central in [default]
    org.ow2.asm#asm;4.0 from central in [default]
    org.pentaho#pentaho-aggdesigner-algorithm;5.1.5-jhyde from repo-1 in [default]
    org.slf4j#log4j-over-slf4j;1.6.6 from central in [default]
    org.slf4j#slf4j-api;1.7.5 from central in [default]
    org.slf4j#slf4j-log4j12;1.7.5 from central in [default]
    org.sonatype.sisu.inject#cglib;2.2.1-v20090111 from central in [default]
    org.tukaani#xz;1.0 from central in [default]
    org.xerial.snappy#snappy-java;1.0.5 from central in [default]
    org.yaml#snakeyaml;1.11 from central in [default]
    oro#oro;2.0.8 from central in [default]
    riffle#riffle;0.1-dev from repo-1 in [default]
    ring#ring-core;1.1.5 from repo-2 in [default]
    ring#ring-devel;0.3.11 from repo-2 in [default]
    ring#ring-jetty-adapter;0.3.11 from repo-2 in [default]
    ring#ring-servlet;0.3.11 from repo-2 in [default]
    stax#stax-api;1.0.1 from central in [default]
    thirdparty#jgrapht-jdk1.6;0.8.1 from repo-1 in [default]
    tomcat#jasper-compiler;5.5.12 from central in [default]
    tomcat#jasper-runtime;5.5.12 from central in [default]
    xmlenc#xmlenc;0.52 from central in [default]
    :: evicted modules:
    commons-logging#commons-logging;1.0.4 by [commons-logging#commons-logging;1.1.1] in [default]
    commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [default]
    commons-codec#commons-codec;1.4 by [commons-codec#commons-codec;1.6] in [default]
    commons-logging#commons-logging;1.0.3 by [commons-logging#commons-logging;1.1.1] in [default]
    commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [default]
    commons-logging#commons-logging;1.1.1 by [commons-logging#commons-logging;1.1.3] in [default]
    jline#jline;1.0 by [jline#jline;2.11] in [default]
    org.antlr#antlr-runtime;3.3 by [org.antlr#antlr-runtime;3.4] in [default]
    com.google.guava#guava;11.0 by [com.google.guava#guava;14.0.1] in [default]
    com.google.code.findbugs#jsr305;1.3.9 by [com.google.code.findbugs#jsr305;3.0.0] in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.8.8 by [org.codehaus.jackson#jackson-mapper-asl;1.9.2] in [default]
    org.codehaus.jackson#jackson-core-asl;1.8.8 by [org.codehaus.jackson#jackson-core-asl;1.9.2] in [default]
    joda-time#joda-time;2.1 by [joda-time#joda-time;2.5] in [default]
    org.apache.avro#avro;1.7.4 by [org.apache.avro#avro;1.7.5] in [default]
    org.xerial.snappy#snappy-java;1.0.4.1 by [org.xerial.snappy#snappy-java;1.0.5] in [default]
    org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.2] in [default]
    org.codehaus.janino#janino;2.7.5 by [org.codehaus.janino#janino;2.7.6] in [default]
    org.codehaus.janino#commons-compiler;2.7.5 by [org.codehaus.janino#commons-compiler;2.7.6] in [default]
    org.slf4j#slf4j-api;1.7.2 by [org.slf4j#slf4j-api;1.7.5] in [default]
    joda-time#joda-time;2.0 by [joda-time#joda-time;2.1] in [default]
    ring#ring-core;0.3.11 by [ring#ring-core;1.1.5] in [default]
    commons-lang#commons-lang;2.5 by [commons-lang#commons-lang;2.6] in [default]
    org.slf4j#slf4j-api;1.6.6 by [org.slf4j#slf4j-api;1.7.5] in [default]
    jline#jline;2.11 by [jline#jline;2.12] in [default]
    commons-collections#commons-collections;3.1 by [commons-collections#commons-collections;3.2.1] in [default]
    commons-httpclient#commons-httpclient;3.0.1 by [commons-httpclient#commons-httpclient;3.1] in [default]
    asm#asm;3.1 by [asm#asm;3.2] in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |  193  |   0   |   0   |   27  ||  166  |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 166 already retrieved (0kB/59ms)
Warning: Local jar /home/aeneas.wiener/.ivy2/jars/org.apache.curator_apache-curator-2.6.0.jar does not exist, skipping.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/16 09:55:33 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/home/aeneas.wiener/.ivy2/jars/org.apache.curator_apache-curator-2.6.0.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:553)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:552)
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:552)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:551)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:551)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)
16/11/16 09:55:33 WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
16/11/16 09:55:33 ERROR org.apache.spark.util.Utils: Uncaught exception in thread Thread-2
java.lang.NullPointerException
    at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
    at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1360)
    at org.apache.spark.SparkEnv.stop(SparkEnv.scala:87)
    at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1814)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1287)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1813)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)
16/11/16 09:55:33 WARN org.apache.spark.SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:236)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:745)
16/11/16 09:55:34 WARN org.apache.hadoop.hdfs.DFSClient: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1249)
    at java.lang.Thread.join(Thread.java:1323)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
16/11/16 09:55:37 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/home/aeneas.wiener/.ivy2/jars/org.apache.curator_apache-curator-2.6.0.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:553)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:552)
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:552)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:551)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:551)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)
16/11/16 09:55:37 WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
16/11/16 09:55:37 ERROR org.apache.spark.util.Utils: Uncaught exception in thread Thread-2
java.lang.NullPointerException
    at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
    at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1360)
    at org.apache.spark.SparkEnv.stop(SparkEnv.scala:87)
    at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1814)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1287)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1813)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/shell.py", line 47, in <module>
    spark = SparkSession.builder.getOrCreate()
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 169, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/usr/lib/spark/python/pyspark/context.py", line 294, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/usr/lib/spark/python/pyspark/context.py", line 115, in __init__
    conf, jsc, profiler_cls)
  File "/usr/lib/spark/python/pyspark/context.py", line 168, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/usr/lib/spark/python/pyspark/context.py", line 233, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1401, in __call__
  File "/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: File file:/home/aeneas.wiener/.ivy2/jars/org.apache.curator_apache-curator-2.6.0.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:340)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:433)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:553)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:552)
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:552)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:551)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:551)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)

>>>
jbaiera commented 8 years ago

@aeneaswiener Does this error manifest when running without the es-hadoop libraries? Curator is a dependency of Spark. ES-Hadoop makes no calls to it explicitly.

aeneaswiener commented 8 years ago

@jbaiera the error only appears once I request for ES-Hadoop to be installed by passing it via the --packages flag.

I have no issues running Spark jobs unless I need to use the ES-Hadoop library.

One workaround I have found is just copying another jar into the location of the missing jar. The Pyspark shell starts up fine after that and ES-Hadoop is usable for everything I have tried so far. It is a temporary ugly workaround but obviously not a fix.

jbaiera commented 7 years ago

This feels more like an issue with Spark to be honest. Looking through the logs it shows that it was able to successfully find the curator artifact in central and resolve it. It's disappearance from the local ivy repository seems to be the root of the problem. It's possible that there's a problem with Spark's package deployment. Closing this for now.