Closed tomjirsa closed 7 years ago
First prototype is solved in diploma thesis supervised by @xdanos
@xdanos - We need to start a transition to Latest stable Spark.
Od Filipa moc informací nemám, resp. žádné zásadní zjištění.
Přechod na 2.1.0 by měl být bez problémů, co se týká clusteru jako takového. Aplikace asi fungovat nebudou, je potřeba něco málo přepsat, minimálně u Javy to platilo. Ovšem v Python aplikacích pro Spark se nevyznám a nedokážu to posoudit.
@xdanos / @cermmik:
Building Spark /w Hadoop, at Ansible runtime? Good luck :-D
There are million ways to cache the tar archive locally.
When running application, following error occurs:
File "/home/spark/applications/./protocols-statistics/protocols_statistics.py", line 165, in <module>
input_stream = KafkaUtils.createStream(ssc, args.input_zookeeper, "spark-consumer-" + application_name, {args.input_topic: kafka_partitions})
File "/opt/spark/spark-bin/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 70, in createStream
File "/opt/spark/spark-bin/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark/spark-bin/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o32.createStream.
: java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:168)
at org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createStream(KafkaUtils.scala:632)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more
org/apache/spark/Logging was removed after spark 1.5.2.
probably use another library?
spark-streaming-kafka-0-8-assembly_2.11 does the trick. Maven repo
We use Kafka 0.10.x -- spark-streaming-kafka-0-10-assembly_2.11
would be better.
We prefer stable version 0.8 with python support - see kafka integration guide
Jo.
Application protocol statistics tested - no need for mondification. Merged with master #27 .
Stream4Flow is now running Spark 2.1.0.
Update Stream4Flow to Spark Streaming latest version 2.1.0
[x] Update Apache Spark Streaming
[x] Update Ansible deployment
[x] Update Applications