Closed mkscala closed 8 years ago
When I tried the %elasticsearch and %hive it does not seem to work %sql, %pyspark is working fine
Are you using dylanmei/zeppelin:latest
or a custom build? How did you configure the elasticsearch interpreter? hive?
I have directly used dylanmei/zepplin:latest without any changes. Does the ElasticSearch comes within this only does it only have spark cluster? Have you tried elasticserach ? If so can i add the elasticsearch cluster to the custom build. If so please let me know
Zeppelin only comes with the ElasticSearch interpreter, not ElaticSearch itself. However, using docker-compose
it's trivial to add ElasticSearch.
1) Create a new docker-compose.yml, and add an elasticsearch
image, like below:
zeppelin:
image: dylanmei/zeppelin:latest
environment:
ZEPPELIN_PORT: 8080
links:
- elasticsearch:elasticsearch
ports:
- 8080:8080
elasticsearch:
image: elasticsearch:2.3
ports:
- 9200:9200
- 9300:9300
Run docker-compose up
and open Zeppelin on port 8080 as usual.
2) Go the the interpreters section, scroll down to the ElasticSearch interpreter, and change elasticsearch.host
from localhost
to elasticsearch
. When you save, if Zeppelin asks to restart the interpreter, choose yes.
3) Create a new notebook, add these three paragraphs, and run them.
first paragraph
%sh curl -s http://elasticsearch:9200/
second paragraph
%elasticsearch index /testing/test/1 {"hello": "world"}
third paragraph
%elasticsearch search /testing/test
You can learn more about the ElasticSearch interpreter here: http://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/interpreter/elasticsearch.html
Awesome. Lots of details. Thanks a lot. I will try and let you know. With these addition can I try the below in your dylanmei/zeppelin:latest container, basically i wanted to store the data processed in Spark to Elasticsearch, do you have some sample ? I will anyways try over the weekend.
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext.
import org.elasticsearch.spark.
val conf = ...
val sc = new SparkContext(conf)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
Hi, do you have your skype or email id so that i can clarify few doubts.?
Ok, that is different than using the interpreter.
You need to add elasticsearch settings to the Spark Intepreter, and load an elasticsearch-spark dependency.
Update your docker-compose
zeppelin:
image: dylanmei/zeppelin:latest
environment:
ZEPPELIN_PORT: 8080
ZEPPELIN_JAVA_OPTS: >-
-Dspark.driver.memory=1g
-Dspark.executor.memory=2g
SPARK_HOME: /usr/spark
SPARK_SUBMIT_OPTIONS: >-
--conf spark.es.nodes=elasticsearch
--conf spark.es.nodes.wan.only=true
--conf spark.es.port=9200
MASTER: local[*]
links:
- elasticsearch:elasticsearch
ports:
- 8080:8080
elasticsearch:
image: elasticsearch:2.3
ports:
- 9200:9200
- 9300:9300
In a new notebook, add a dependency paragraph
%dep z.load("org.elasticsearch:elasticsearch-spark_2.10:2.2.0")
Write to an index
%spark
import org.elasticsearch.spark.sql._
case class Thing(id: Integer, name: String)
val things = Seq(
Thing(1, "a"),
Thing(2, "b"),
Thing(3, "c"),
Thing(4, "d"),
Thing(5, "e"))
val df1 = sc.parallelize(things).toDF()
EsSparkSQL.saveToEs(df1, "things/thing", Map("es.mapping.id" -> "id"))
Read from the index
%spark
val df2 = EsSparkSQL.esDF(sqlc,"things/thing")
df2.show()
Your output should be
df2: org.apache.spark.sql.DataFrame = [id: bigint, name: string]
+---+----+
| id|name|
+---+----+
| 5| e|
| 2| b|
| 4| d|
| 1| a|
| 3| c|
+---+----+
Awesome. I will try this. This is cool example. I basically wanted to mine raw email/chat transcripts for certain keywords. and group the documents(chat/email) accordingly. Have you tried any ML algorithm for the same within Spark/Elasticsearch/MLlib
That's real interesting. I have not tried anything like that. I have done heavy writing into ElasticSearch with Spark and it works well.
There is now an ElasticSearch-specific example docker-compose file in the./examples
directory, based on our conversations here. You may need to re-pull the dylanmei/zeppelin:latest
image to use it.
Awesome. Thanks for all your help.
zeppelin-interpreter zeppelin-zengine spark-dependencies spark markdown angular shell hive phoenix postgresql jdbc tajo flink ignite kylin lens cassandra elasticsearch interpreter notebook