Intel-bigdata / HiBench

HiBench is a big data benchmark suite.
Other
1.45k stars 761 forks source link

Streaming Section for storm does not work /dataGen ERROR #681

Open ghfaha opened 3 years ago

ghfaha commented 3 years ago

Hello, I have a cluster of three nodes of hadoop , kafka and zookeeper and storm.. I used the versions as you recommend ... I even test with higher version of kafka and zookeeper.. I am trying to run streaming benchmark , I am not able to generate data it gives me this error

/opt/HiBench/bin/workloads/streaming/identity/prepare/dataGen.sh patching args= Parsing conf: /opt/HiBench/conf/hadoop.conf Parsing conf: /opt/HiBench/conf/hibench.conf Parsing conf: /opt/HiBench/conf/spark.conf Parsing conf: /opt/HiBench/conf/storm.conf Parsing conf: /opt/HiBench/conf/workloads/streaming/identity.conf probe sleep jar: /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.2.2-tests.jar start StreamingIdentityPrepare bench Sending streaming data to kafka, periodically: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/bin/java -Xmx1024M -server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=bin/../logs -cp /opt/HiBench/autogen/target/autogen-8.0-SNAPSHOT-jar-with-dependencies.jar com.intel.hibench.datagen.streaming.DataGenerator /opt/HiBench/report/identity/prepare/conf/sparkbench/sparkbench.conf hdfs://cmu-clinique.novalocal:9000/HiBench/Streaming/Seed/uservisits 0 hdfs://cmu-clinique.novalocal:9000/HiBench/Streaming/Kmeans/Samples 0 StreamBench Data Generator Interval Span : 50 ms Record Per Interval : 5 Record Length : 200 bytes Producer Number : 1 Total Records : -1 [Infinity] Total Rounds : -1 [Infinity] Kafka Topic : identity

Estimated Speed : 100 records/second 0.02 Mb/second

log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at com.intel.hibench.datagen.streaming.util.CachedData.(CachedData.java:52) at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43) at com.intel.hibench.datagen.streaming.util.KafkaSender.(KafkaSender.java:55) at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

finish StreamingIdentityPrepare bench

and if I try to Run the streaming application I got error. In fact My application is storm cluster.(bin/workloads/streaming/identity/storm/run.sh) When I am running the command I see that it sayas "finish Storm Identity bench" and it create identity topology correctly although I see the error in my storm UI ,Bolt section :**

java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/identity/partitions at org.apache.storm.kafka.DynamicBrokersReader.getBrokerInfo(DynamicBrokersReader.java:100) at org.apache.storm.kafka.trident.ZkBrokerReader.(ZkBrokerReader.java:44) at org.apache.storm.kafka.KafkaUtils.makeBrokerReader(KafkaUtils.java:58) at org.apache.storm.kafka.KafkaSpout.open(KafkaSpout.java:77) at org.apache.storm.daemon.executor$fn7885$fn7900.invoke(executor.clj:601) at org.apache.storm.util$async_loop$fn__625.invoke(util.clj:482) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/identity/partitions at org.apache.storm.kafka.DynamicBrokersReader.getNumPartitions(DynamicBrokersReader.java:114) at org.apache.storm.kafka.DynamicBrokersReader.getBrokerInfo(DynamicBrokersReader.java:84) ... 7 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/identity/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214) at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:200) at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191) at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38) at org.apache.storm.kafka.DynamicBrokersReader.getNumPartitions(DynamicBrokersReader.java:111) ... 8 more

If I try to generate the report by command bin/workloads/streaming/identity/common/metrics_reader.sh . It generate empty report csv file , Would you please guide me? Thanks ...

ghfaha commented 3 years ago

Hello Again, Is there anybody can help me with this? After I run /opt/HiBench/bin/workloads/streaming/identity/prepare/dataGen.sh and got the error of above, I can not see any topic named as identity in my kafka topics list!

ghfaha commented 3 years ago

The error is : Cluster of three nodes Hadoop version 3.2.2 Spark 3.1.1 Storm 1.0.1 ( I also test with last version) Kafka 0.8.2.2 ( also test with newer version the error is the same) Zookeeper 3.4.8 ( I also test with other versions like 3.5.6 and etc)

bin/workloads/streaming/identity/prepare/dataGen.sh patching args= Parsing conf: /opt/HiBench/conf/hadoop.conf Parsing conf: /opt/HiBench/conf/hibench.conf Parsing conf: /opt/HiBench/conf/spark.conf Parsing conf: /opt/HiBench/conf/storm.conf Parsing conf: /opt/HiBench/conf/workloads/streaming/identity.conf probe sleep jar: /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.2.2-tests.jar start StreamingIdentityPrepare bench Sending streaming data to kafka, periodically: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/bin/java -Xmx1024M -server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=bin/../logs -cp /opt/HiBench/autogen/target/autogen-8.0-SNAPSHOT-jar-with-dependencies.jar com.intel.hibench.datagen.streaming.DataGenerator /opt/HiBench/report/identity/prepare/conf/sparkbench/sparkbench.conf hdfs://cmu-clinique.novalocal:9000/HiBench/Streaming/Seed/uservisits 0 hdfs://cmu-clinique.novalocal:9000/HiBench/Streaming/Kmeans/Samples 0

StreamBench Data Generator Record Per Interval : 5 Record Length : 200 bytes Producer Number : 1 Total Records : -1 [Infinity] Total Rounds : -1 [Infinity] Kafka Topic : identity

Estimated Speed : 100 records/second 0.02 Mb/second

[2021-08-12 10:49:06,653] INFO ProducerConfig values: compression.type = none metric.reporters = [] metadata.max.age.ms = 300000 metadata.fetch.timeout.ms = 60000 acks = 1 batch.size = 16384 reconnect.backoff.ms = 10 bootstrap.servers = [cmu-clinique.novalocal:9092, cmu-clinique-node-1.novalocal:9092, cmu-node2.novalocal:9092] receive.buffer.bytes = 32768 retry.backoff.ms = 100 buffer.memory = 33554432 timeout.ms = 30000 key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer retries = 0 max.request.size = 1048576 block.on.buffer.full = true value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer metrics.sample.window.ms = 30000 send.buffer.bytes = 131072 max.in.flight.requests.per.connection = 5 metrics.num.samples = 2 linger.ms = 0 client.id = (org.apache.kafka.clients.producer.ProducerConfig) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at com.intel.hibench.datagen.streaming.util.CachedData.(CachedData.java:52) at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43) at com.intel.hibench.datagen.streaming.util.KafkaSender.(KafkaSender.java:53) at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

finish StreamingIdentityPrepare bench

ghfaha commented 3 years ago

I added in main pom file of the project a dependency like this \<dependency> \<groupId>com.google.guava\</groupId> \<artifactId>guava\</artifactId> \<version>27.0-jre\</version> \</dependency> now that error has changed :

(org.apache.kafka.clients.producer.ProducerConfig) [2021-08-13 09:40:14,305] INFO fs.default.name is deprecated. Instead, use fs.defaultFS (org.apache.hadoop.conf.Configuration.deprecation) [2021-08-13 09:40:14,384] WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (org.apache.hadoop.util.NativeCodeLoader) Fail to get reader from path: hdfs://.......novalocal:9000/HiBench/Streaming/Seed/uservisits org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs" at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3353) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3373) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:125) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3424) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3392) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:485) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:233) at com.intel.hibench.datagen.streaming.util.SourceFileReader.getReader(SourceFileReader.java:33) at com.intel.hibench.datagen.streaming.util.CachedData.(CachedData.java:56) at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43) at com.intel.hibench.datagen.streaming.util.KafkaSender.(KafkaSender.java:53) at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Exception in thread "pool-1-thread-1" java.lang.NullPointerException at com.intel.hibench.datagen.streaming.util.CachedData.(CachedData.java:59) at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43) at com.intel.hibench.datagen.streaming.util.KafkaSender.(KafkaSender.java:53) at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) It seems because this part of project does not support hadoop version 3.2. But I am not sure why it happens

whistleboys commented 1 year ago

Hello! I'm having the exact same problem as you! I have tried many methods but still can't solve it, did you solve it? How is it solved? Could you please share it ^v^