Open Arash-Afshar opened 6 years ago
I tracked it down to this line: https://github.com/CODAIT/spark-bench/blob/be31655ecd8eac5f1b7141cbc5bd6ea640ae0ddc/utils/src/main/scala/com/ibm/sparktc/sparkbench/utils/SparkFuncs.scala#L52
When calling graph data gen, the output is txt, but the function defined at that line does not recognize txt as a valid extension.
I don't think it supports text formatting. You could try to change the output file suffix to .csv
It would not work. The documentation of graph data gen states that it should be *.txt: https://codait.github.io/spark-bench/workloads/data-generator-graph/
I have also tried it with a not-txt extension and it had failed with a different error message, saying to choose txt.
It could be fixed in this pull request. https://github.com/CODAIT/spark-bench/pull/180
Spark-Bench version (version number, tag, or git commit hash)
spark-bench_2.3.0_0.4.0-RELEASE
Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)
Spark 2.2.0, Yarn
Scala version on your cluster
Your exact configuration file (with system details anonymized for security)
spark-bench = { spark-submit-config = [{ spark-args = { master = "yarn" executor-memory = 5G num-executors = 5 } workload-suites = [ { descr = "Graph Gen" benchmark-output = "console" workloads = [ { name = "graph-data-generator" vertices = 1000 output = "hdfs:///one-thousand-vertex-graph.txt" } ] } ] }] }
Relevant stacktrace
18/04/30 22:21:00 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (**:40656) with ID 1 18/04/30 22:21:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager **:40021 with 2.5 GB RAM, BlockManagerId(1, *****, 40021, None) 18/04/30 22:21:15 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) Exception in thread "main" java.lang.Exception: Unrecognized or unspecified save format. Please check the file extension or add a file format to your arguments: Some(hdfs:///one-thousand-vertex-graph.txt) at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyFormatOrThrow(SparkFuncs.scala:92) at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyOutput(SparkFuncs.scala:35) at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:49) at com.ibm.sparktc.sparkbench.datageneration.GraphDataGen.run(GraphDataGen.scala:90) at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98) at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially(SuiteKickoff.scala:98) at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:72) at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:67) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.run(SuiteKickoff.scala:67) at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38) at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38) at scala.collection.immutable.List.foreach(List.scala:381) at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially(MultipleSuiteKickoff.scala:38) at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:28) at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:25) at scala.collection.immutable.List.foreach(List.scala:381) at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.run(MultipleSuiteKickoff.scala:25) at com.ibm.sparktc.sparkbench.cli.CLIKickoff$.main(CLIKickoff.scala:30) at com.ibm.sparktc.sparkbench.cli.CLIKickoff.main(CLIKickoff.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/04/30 22:21:15 INFO spark.SparkContext: Invoking stop() from shutdown hook
Description of your problem and any other relevant info
Despite using "hdfs:///one-thousand-vertex-graph.txt" as output, it complains about incorrect output format: