CODAIT / spark-bench

Benchmark Suite for Apache Spark
https://codait.github.io/spark-bench/
Apache License 2.0
239 stars 123 forks source link

How to run Logistic Regression test? #179

Open lovengulu opened 5 years ago

lovengulu commented 5 years ago

Spark-Bench version (version number, tag, or git commit hash)

spark-bench_2.3.0_0.4.0-RELEASE

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Centos 7.4 HDP-2.6.5.0 - standalone spark2 (2.3.0)

Scala version on your cluster

Not sure

Your exact configuration file (with system details anonymized for security)

spark-bench = {
  spark-submit-config = [{
    spark-args = {
      master = "yarn" // FILL IN YOUR MASTER HERE
     // num-executors = 3
     // executor-memory = "XXXXXXX" // FILL IN YOUR EXECUTOR MEMORY
    }
    conf = {
      // Any configuration you need for your setup goes here, like:
      "spark.executor.cores" = "3"
      "spark.executor.memory" = "5g"
      "spark.driver.memory"   = "5g"
      // "spark.dynamicAllocation.enabled" = "false"
    }
    workload-suites = [
      {
        descr = "LogisticRegression Workloads"
        benchmark-output = "console"
        workloads = [
          {
            //name     = "logisticregression"
            name     = "lr-bml"
            input    = "hdfs:///tmp/data/data-10M.parquet" // training dataset
            testfile = "hdfs:///tmp/data/data-50M.parquet" // testing dataset
          }
        ]
      }
    ]
  }]
}

Relevant stacktrace

[root@tug190-1 spark-bench_2.3.0_0.4.0-RELEASE]# sudo -u hdfs ./bin/spark-bench.sh examples/yf-logisticRegression.conf
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
18/10/18 06:32:53 INFO CLIKickoff$: args received: {"spark-bench":{"spark-submit-config":[{"conf":{"spark.driver.memory":"5g","spark.executor.cores":"3","spark.executor.memory":"5g"},"spark-args":{"master":"yarn"},"workload-suites":[{"benchmark-output":"console","descr":"LogisticRegression Workloads","workloads":[{"input":"hdfs:///tmp/data/data-10M.parquet","name":"lr-bml","testfile":"hdfs:///tmp/data/data-50M.parquet"}]}]}]}}
18/10/18 06:32:54 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292
18/10/18 06:32:54 INFO SparkContext: Submitted application: com.ibm.sparktc.sparkbench.cli.CLIKickoff
18/10/18 06:32:54 INFO SecurityManager: Changing view acls to: hdfs
18/10/18 06:32:54 INFO SecurityManager: Changing modify acls to: hdfs
18/10/18 06:32:54 INFO SecurityManager: Changing view acls groups to:
18/10/18 06:32:54 INFO SecurityManager: Changing modify acls groups to:
18/10/18 06:32:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hdfs); groups with view permissions: Set(); users  with modify permissions: Set(hdfs); groups with modify permissions: Set()
18/10/18 06:32:54 INFO Utils: Successfully started service 'sparkDriver' on port 45479.
18/10/18 06:32:54 INFO SparkEnv: Registering MapOutputTracker
18/10/18 06:32:54 INFO SparkEnv: Registering BlockManagerMaster
18/10/18 06:32:54 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/10/18 06:32:54 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/10/18 06:32:54 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b37be4cf-6615-42ec-8bca-dd980d2b7c8d
18/10/18 06:32:54 INFO MemoryStore: MemoryStore started with capacity 2.5 GB
18/10/18 06:32:54 INFO SparkEnv: Registering OutputCommitCoordinator
18/10/18 06:32:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/10/18 06:32:54 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://tug190-1.yfsubnet:4040
18/10/18 06:32:54 INFO SparkContext: Added JAR file:/opt/spark-bench_2.3.0_0.4.0-RELEASE/lib/spark-bench-2.3.0_0.4.0-RELEASE.jar at spark://tug190-1.yfsubnet:45479/jars/spark-bench-2.3.0_0.4.0-RELEASE.jar with timestamp 1539858774960
18/10/18 06:32:55 INFO RMProxy: Connecting to ResourceManager at tug190-1.yfsubnet/10.200.10.191:8050
18/10/18 06:32:55 INFO Client: Requesting a new application from cluster with 1 NodeManagers
18/10/18 06:32:56 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (40960 MB per container)
18/10/18 06:32:56 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/10/18 06:32:56 INFO Client: Setting up container launch context for our AM
18/10/18 06:32:56 INFO Client: Setting up the launch environment for our AM container
18/10/18 06:32:56 INFO Client: Preparing resources for our AM container
18/10/18 06:32:57 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://tug190-1.yfsubnet:8020/hdp/apps/2.6.5.0-292/spark2/spark2-hdp-yarn-archive.tar.gz
18/10/18 06:32:57 INFO Client: Source and destination file systems are the same. Not copying hdfs://tug190-1.yfsubnet:8020/hdp/apps/2.6.5.0-292/spark2/spark2-hdp-yarn-archive.tar.gz
18/10/18 06:32:57 INFO Client: Uploading resource file:/tmp/spark-b1d2dbb7-2f41-42f0-bd49-dd8f3c5ebecb/__spark_conf__3844613980751995533.zip -> hdfs://tug190-1.yfsubnet:8020/user/hdfs/.sparkStaging/application_1539852791165_0007/__spark_conf__.zip
18/10/18 06:32:57 INFO SecurityManager: Changing view acls to: hdfs
18/10/18 06:32:57 INFO SecurityManager: Changing modify acls to: hdfs
18/10/18 06:32:57 INFO SecurityManager: Changing view acls groups to:
18/10/18 06:32:57 INFO SecurityManager: Changing modify acls groups to:
18/10/18 06:32:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hdfs); groups with view permissions: Set(); users  with modify permissions: Set(hdfs); groups with modify permissions: Set()
18/10/18 06:32:57 INFO Client: Submitting application application_1539852791165_0007 to ResourceManager
18/10/18 06:32:58 INFO YarnClientImpl: Submitted application application_1539852791165_0007
18/10/18 06:32:58 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1539852791165_0007 and attemptId None
18/10/18 06:32:59 INFO Client: Application report for application_1539852791165_0007 (state: ACCEPTED)
18/10/18 06:32:59 INFO Client:
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1539858778005
         final status: UNDEFINED
         tracking URL: http://tug190-1.yfsubnet:8088/proxy/application_1539852791165_0007/
         user: hdfs
18/10/18 06:33:00 INFO Client: Application report for application_1539852791165_0007 (state: ACCEPTED)
18/10/18 06:33:01 INFO Client: Application report for application_1539852791165_0007 (state: ACCEPTED)
18/10/18 06:33:02 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> tug190-1.yfsubnet, PROXY_URI_BASES -> http://tug190-1.yfsubnet:8088/proxy/application_1539852791165_0007), /proxy/application_1539852791165_0007
18/10/18 06:33:02 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
18/10/18 06:33:02 INFO Client: Application report for application_1539852791165_0007 (state: ACCEPTED)
18/10/18 06:33:02 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
18/10/18 06:33:03 INFO Client: Application report for application_1539852791165_0007 (state: RUNNING)
18/10/18 06:33:03 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 10.200.10.191
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1539858778005
         final status: UNDEFINED
         tracking URL: http://tug190-1.yfsubnet:8088/proxy/application_1539852791165_0007/
         user: hdfs
18/10/18 06:33:03 INFO YarnClientSchedulerBackend: Application application_1539852791165_0007 has started running.
18/10/18 06:33:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41481.
18/10/18 06:33:03 INFO NettyBlockTransferService: Server created on tug190-1.yfsubnet:41481
18/10/18 06:33:03 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/10/18 06:33:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, tug190-1.yfsubnet, 41481, None)
18/10/18 06:33:03 INFO BlockManagerMasterEndpoint: Registering block manager tug190-1.yfsubnet:41481 with 2.5 GB RAM, BlockManagerId(driver, tug190-1.yfsubnet, 41481, None)
18/10/18 06:33:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, tug190-1.yfsubnet, 41481, None)
18/10/18 06:33:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, tug190-1.yfsubnet, 41481, None)
18/10/18 06:33:03 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/application_1539852791165_0007
18/10/18 06:33:05 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.200.10.191:50456) with ID 1
18/10/18 06:33:05 INFO BlockManagerMasterEndpoint: Registering block manager tug190-1.yfsubnet:45980 with 2.5 GB RAM, BlockManagerId(1, tug190-1.yfsubnet, 45980, None)
18/10/18 06:33:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.200.10.191:50460) with ID 2
18/10/18 06:33:06 INFO BlockManagerMasterEndpoint: Registering block manager tug190-1.yfsubnet:44023 with 2.5 GB RAM, BlockManagerId(2, tug190-1.yfsubnet, 44023, None)
18/10/18 06:33:06 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/10/18 06:33:06 INFO SharedState: loading hive config file: file:/etc/spark2/2.6.5.0-292/0/hive-site.xml
18/10/18 06:33:07 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark-bench_2.3.0_0.4.0-RELEASE/spark-warehouse').
18/10/18 06:33:07 INFO SharedState: Warehouse path is 'file:/opt/spark-bench_2.3.0_0.4.0-RELEASE/spark-warehouse'.
18/10/18 06:33:07 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/10/18 06:33:08 INFO SparkContext: Starting job: parquet at SparkFuncs.scala:124
18/10/18 06:33:08 INFO DAGScheduler: Got job 0 (parquet at SparkFuncs.scala:124) with 1 output partitions
18/10/18 06:33:08 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at SparkFuncs.scala:124)
18/10/18 06:33:08 INFO DAGScheduler: Parents of final stage: List()
18/10/18 06:33:08 INFO DAGScheduler: Missing parents: List()
18/10/18 06:33:08 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at SparkFuncs.scala:124), which has no missing parents
18/10/18 06:33:08 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 89.9 KB, free 2.5 GB)
18/10/18 06:33:08 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.0 KB, free 2.5 GB)
18/10/18 06:33:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on tug190-1.yfsubnet:41481 (size: 34.0 KB, free: 2.5 GB)
18/10/18 06:33:08 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
18/10/18 06:33:08 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at SparkFuncs.scala:124) (first 15 tasks are for partitions Vector(0))
18/10/18 06:33:08 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
18/10/18 06:33:08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, tug190-1.yfsubnet, executor 2, partition 0, PROCESS_LOCAL, 8089 bytes)
18/10/18 06:33:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on tug190-1.yfsubnet:44023 (size: 34.0 KB, free: 2.5 GB)
18/10/18 06:33:10 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2133 ms on tug190-1.yfsubnet (executor 2) (1/1)
18/10/18 06:33:10 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/10/18 06:33:10 INFO DAGScheduler: ResultStage 0 (parquet at SparkFuncs.scala:124) finished in 2.256 s
18/10/18 06:33:10 INFO DAGScheduler: Job 0 finished: parquet at SparkFuncs.scala:124, took 2.320101 s
18/10/18 06:33:10 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 344.1 KB, free 2.5 GB)
18/10/18 06:33:10 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 31.6 KB, free 2.5 GB)
18/10/18 06:33:10 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on tug190-1.yfsubnet:41481 (size: 31.6 KB, free: 2.5 GB)
18/10/18 06:33:10 INFO SparkContext: Created broadcast 1 from textFile at LogisticRegressionWorkload.scala:73
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 15
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 2
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 14
18/10/18 06:33:11 INFO BlockManagerInfo: Removed broadcast_0_piece0 on tug190-1.yfsubnet:41481 in memory (size: 34.0 KB, free: 2.5 GB)
18/10/18 06:33:11 INFO BlockManagerInfo: Removed broadcast_0_piece0 on tug190-1.yfsubnet:44023 in memory (size: 34.0 KB, free: 2.5 GB)
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 3
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 1
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 25
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 17
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 5
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 23
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 7
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 16
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 18
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 19
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 24
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 4
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 9
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 20
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 8
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 26
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 6
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 12
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 13
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 22
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 10
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 11
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 21
18/10/18 06:33:11 INFO ContextCleaner: Cleaned accumulator 0
18/10/18 06:33:11 INFO CodeGenerator: Code generated in 285.084626 ms
18/10/18 06:33:11 INFO FileInputFormat: Total input paths to process : 10
18/10/18 06:33:12 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 344.2 KB, free 2.5 GB)
18/10/18 06:33:12 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 31.6 KB, free 2.5 GB)
18/10/18 06:33:12 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on tug190-1.yfsubnet:41481 (size: 31.6 KB, free: 2.5 GB)
18/10/18 06:33:12 INFO SparkContext: Created broadcast 2 from textFile at LogisticRegressionWorkload.scala:73
18/10/18 06:33:12 INFO FileInputFormat: Total input paths to process : 10
18/10/18 06:33:12 INFO CodeGenerator: Code generated in 16.343671 ms
18/10/18 06:33:12 INFO CodeGenerator: Code generated in 16.52826 ms
18/10/18 06:33:12 INFO SparkContext: Starting job: count at LogisticRegressionWorkload.scala:92
18/10/18 06:33:12 INFO DAGScheduler: Registering RDD 8 (cache at LogisticRegressionWorkload.scala:84)
18/10/18 06:33:12 INFO DAGScheduler: Registering RDD 24 (count at LogisticRegressionWorkload.scala:92)
18/10/18 06:33:12 INFO DAGScheduler: Got job 1 (count at LogisticRegressionWorkload.scala:92) with 1 output partitions
18/10/18 06:33:12 INFO DAGScheduler: Final stage: ResultStage 3 (count at LogisticRegressionWorkload.scala:92)
18/10/18 06:33:12 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
18/10/18 06:33:12 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2)
18/10/18 06:33:12 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[8] at cache at LogisticRegressionWorkload.scala:84), which has no missing parents
18/10/18 06:33:12 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 23.4 KB, free 2.5 GB)
18/10/18 06:33:12 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.3 KB, free 2.5 GB)
18/10/18 06:33:12 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on tug190-1.yfsubnet:41481 (size: 8.3 KB, free: 2.5 GB)
18/10/18 06:33:12 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1039
18/10/18 06:33:12 INFO DAGScheduler: Submitting 20 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[8] at cache at LogisticRegressionWorkload.scala:84) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
18/10/18 06:33:12 INFO YarnScheduler: Adding task set 1.0 with 20 tasks
18/10/18 06:33:12 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, tug190-1.yfsubnet, executor 1, partition 0, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:12 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, tug190-1.yfsubnet, executor 2, partition 1, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:12 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, tug190-1.yfsubnet, executor 1, partition 2, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:12 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, tug190-1.yfsubnet, executor 2, partition 3, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:12 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, tug190-1.yfsubnet, executor 1, partition 4, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:12 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6, tug190-1.yfsubnet, executor 2, partition 5, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:12 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on tug190-1.yfsubnet:44023 (size: 8.3 KB, free: 2.5 GB)
18/10/18 06:33:12 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on tug190-1.yfsubnet:44023 (size: 31.6 KB, free: 2.5 GB)
18/10/18 06:33:12 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on tug190-1.yfsubnet:45980 (size: 8.3 KB, free: 2.5 GB)
18/10/18 06:33:13 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 7, tug190-1.yfsubnet, executor 2, partition 6, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 8, tug190-1.yfsubnet, executor 2, partition 7, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 9, tug190-1.yfsubnet, executor 2, partition 8, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 10, tug190-1.yfsubnet, executor 2, partition 9, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 WARN TaskSetManager: Lost task 8.0 in stage 1.0 (TID 9, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "PAR1 ������"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

18/10/18 06:33:13 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "�?&��Dѿ6Qs�?���l€�?��?/��˿�sTN�Ӡ�>6���� �pD�?������?��G��ܿ�t�<Rg�?���~�������C��?�B��"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

18/10/18 06:33:13 WARN TaskSetManager: Lost task 3.0 in stage 1.0 (TID 4, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "��俲;3԰�?�ѹ��f�?�'VM�?R�����?����Ɔ�?B"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

18/10/18 06:33:13 WARN TaskSetManager: Lost task 5.0 in stage 1.0 (TID 6, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "�b���<0�.N��."
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

18/10/18 06:33:13 INFO TaskSetManager: Starting task 5.1 in stage 1.0 (TID 11, tug190-1.yfsubnet, executor 2, partition 5, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Starting task 3.1 in stage 1.0 (TID 12, tug190-1.yfsubnet, executor 2, partition 3, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 WARN TaskSetManager: Lost task 7.0 in stage 1.0 (TID 8, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "ɼ���")}<��?4��Q�Gܿ�o���������ѿn��(�H�����D�S����fE��?��Wh��?�~t�'V�?�5~��ۿί�_5�?�hA�q���%kЇ�?pՂ^+�㿂���F��?P� �zX�?bT�       gw��"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

18/10/18 06:33:13 INFO TaskSetManager: Lost task 6.0 in stage 1.0 (TID 7) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "PAR1 ������") [duplicate 1]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 6.1 in stage 1.0 (TID 13, tug190-1.yfsubnet, executor 2, partition 6, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 WARN TaskSetManager: Lost task 9.0 in stage 1.0 (TID 10, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "����b� @$�1�ܿ��:�O�˿��Gg�?�~K�vI�?N�/��ǩ?&g��pۿ��s�Q�?zЗAy�?c&h�Gn�?z"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

18/10/18 06:33:13 INFO TaskSetManager: Starting task 9.1 in stage 1.0 (TID 14, tug190-1.yfsubnet, executor 2, partition 9, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 5.1 in stage 1.0 (TID 11) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "�b���<0�.N���.") [duplicate 1]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 5.2 in stage 1.0 (TID 15, tug190-1.yfsubnet, executor 2, partition 5, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 3.1 in stage 1.0 (TID 12) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "��俲;3԰�?�ѹ��f�?�'VM�?R�����?����Ɔ�?B") [duplicate 1]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 3.2 in stage 1.0 (TID 16, tug190-1.yfsubnet, executor 2, partition 3, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 6.1 in stage 1.0 (TID 13) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "PAR1 ������") [duplicate 2]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 6.2 in stage 1.0 (TID 17, tug190-1.yfsubnet, executor 2, partition 6, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 9.1 in stage 1.0 (TID 14) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "����b� @$N�1�ܿ��:�O�˿��Gg�?�~K�vI�?N�/��ǩ?&g��pۿ��s�Q�?zЗAy�?c&h�Gn�?z") [duplicate 1]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 9.2 in stage 1.0 (TID 18, tug190-1.yfsubnet, executor 2, partition 9, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 5.2 in stage 1.0 (TID 15) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "�b���<0�.N���.") [duplicate 2]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 5.3 in stage 1.0 (TID 19, tug190-1.yfsubnet, executor 2, partition 5, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 3.2 in stage 1.0 (TID 16) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "��俲;3԰�?�ѹ��f�?�'VM�?R�����?����Ɔ�?B") [duplicate 2]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 3.3 in stage 1.0 (TID 20, tug190-1.yfsubnet, executor 2, partition 3, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 6.2 in stage 1.0 (TID 17) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "PAR1 ������") [duplicate 3]
18/10/18 06:33:13 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on tug190-1.yfsubnet:45980 (size: 31.6 KB, free: 2.5 GB)
18/10/18 06:33:13 INFO TaskSetManager: Starting task 6.3 in stage 1.0 (TID 21, tug190-1.yfsubnet, executor 2, partition 6, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 9.2 in stage 1.0 (TID 18) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "����b� @$N�1�ܿ��:�O�˿��Gg�?�~K�vI�?N�/��ǩ?&g��pۿ��s�Q�?zЗAy�?c&h�Gn�?z") [duplicate 2]
18/10/18 06:33:13 INFO TaskSetManager: Starting task 9.3 in stage 1.0 (TID 22, tug190-1.yfsubnet, executor 2, partition 9, NODE_LOCAL, 7971 bytes)
18/10/18 06:33:13 INFO TaskSetManager: Lost task 5.3 in stage 1.0 (TID 19) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "�b���<0�.N���.") [duplicate 3]
18/10/18 06:33:13 ERROR TaskSetManager: Task 5 in stage 1.0 failed 4 times; aborting job
18/10/18 06:33:13 INFO TaskSetManager: Lost task 3.3 in stage 1.0 (TID 20) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "��俲;3԰�?�ѹ��f�?�'VM�?R�����?����Ɔ�?B") [duplicate 3]
18/10/18 06:33:13 INFO YarnScheduler: Cancelling stage 1
18/10/18 06:33:13 INFO YarnScheduler: Stage 1 was cancelled
18/10/18 06:33:13 INFO DAGScheduler: ShuffleMapStage 1 (cache at LogisticRegressionWorkload.scala:84) failed in 0.937 s due to Job aborted due to stage failure: Task 5 in stage 1.0 failed 4 times, most recent failure: Lost task 5.3 in stage 1.0 (TID 19, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "�b���<0�.N���."
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
18/10/18 06:33:13 INFO TaskSetManager: Lost task 9.3 in stage 1.0 (TID 22) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "����b� @$N�1�ܿ��:�O�˿��Gg�?�~K�vI�?N�/��ǩ?&g��pۿ��s�Q�?zЗAy�?c&h�Gn�?z") [duplicate 3]
18/10/18 06:33:13 INFO DAGScheduler: Job 1 failed: count at LogisticRegressionWorkload.scala:92, took 0.963284 s
18/10/18 06:33:13 INFO TaskSetManager: Lost task 6.3 in stage 1.0 (TID 21) on tug190-1.yfsubnet, executor 2: java.lang.NumberFormatException (For input string: "PAR1 ������") [duplicate 4]
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 4 times, most recent failure: Lost task 5.3 in stage 1.0 (TID 19, tug190-1.yfsubnet, executor 2): java.lang.NumberFormatException: For input string: "�b���<0�.N���."
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
        at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:297)
        at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2770)
        at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2769)
        at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
        at org.apache.spark.sql.Dataset.count(Dataset.scala:2769)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$3.apply(LogisticRegressionWorkload.scala:92)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$3.apply(LogisticRegressionWorkload.scala:92)
        at com.ibm.sparktc.sparkbench.utils.GeneralFunctions$.time(GeneralFunctions.scala:48)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload.doWorkload(LogisticRegressionWorkload.scala:92)
        at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:60)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload.run(LogisticRegressionWorkload.scala:62)
        at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
        at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially(SuiteKickoff.scala:98)
        at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:72)
        at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:67)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.immutable.Range.foreach(Range.scala:160)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.run(SuiteKickoff.scala:67)
        at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
        at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially(MultipleSuiteKickoff.scala:38)
        at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:28)
        at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:25)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.run(MultipleSuiteKickoff.scala:25)
        at com.ibm.sparktc.sparkbench.cli.CLIKickoff$.main(CLIKickoff.scala:30)
        at com.ibm.sparktc.sparkbench.cli.CLIKickoff.main(CLIKickoff.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NumberFormatException: For input string: "�b���<0�.N���."
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
        at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:75)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:75)
        at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:74)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
18/10/18 06:33:13 INFO SparkContext: Invoking stop() from shutdown hook
18/10/18 06:33:13 INFO SparkUI: Stopped Spark web UI at http://tug190-1.yfsubnet:4040
18/10/18 06:33:13 INFO YarnClientSchedulerBackend: Interrupting monitor thread
18/10/18 06:33:13 INFO YarnClientSchedulerBackend: Shutting down all executors
18/10/18 06:33:13 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/10/18 06:33:13 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
18/10/18 06:33:13 INFO YarnClientSchedulerBackend: Stopped
18/10/18 06:33:13 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/10/18 06:33:13 INFO MemoryStore: MemoryStore cleared
18/10/18 06:33:13 INFO BlockManager: BlockManager stopped
18/10/18 06:33:13 INFO BlockManagerMaster: BlockManagerMaster stopped
18/10/18 06:33:13 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/10/18 06:33:13 INFO SparkContext: Successfully stopped SparkContext
18/10/18 06:33:13 INFO ShutdownHookManager: Shutdown hook called
18/10/18 06:33:13 INFO ShutdownHookManager: Deleting directory /tmp/spark-b1d2dbb7-2f41-42f0-bd49-dd8f3c5ebecb
18/10/18 06:33:13 INFO ShutdownHookManager: Deleting directory /tmp/spark-14c520e0-74ca-45ec-b090-43c1e6db4b4f
Exception in thread "main" java.lang.Exception: spark-submit failed to complete properly given these arguments:
        /usr/hdp/2.6.5.0-292/spark2/bin/spark-submit
--class
com.ibm.sparktc.sparkbench.cli.CLIKickoff
--master
yarn
--conf
spark.executor.memory=5g
--conf
spark.driver.memory=5g
--conf
spark.executor.cores=3
/opt/spark-bench_2.3.0_0.4.0-RELEASE/lib/spark-bench-2.3.0_0.4.0-RELEASE.jar
{"spark-bench":{"spark-submit-config":[{"conf":{"spark.driver.memory":"5g","spark.executor.cores":"3","spark.executor.memory":"5g"},"spark-args":{"master":"yarn"},"workload-suites":[{"benchmark-output":"console","descr":"LogisticRegression Workloads","workloads":[{"input":"hdfs:///tmp/data/data-10M.parquet","name":"lr-bml","testfile":"hdfs:///tmp/data/data-50M.parquet"}]}]}]}}
        at com.ibm.sparktc.sparkbench.sparklaunch.submission.sparksubmit.SparkSubmit$.submit(SparkSubmit.scala:51)
        at com.ibm.sparktc.sparkbench.sparklaunch.submission.sparksubmit.SparkSubmit$.launch(SparkSubmit.scala:34)
        at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$.com$ibm$sparktc$sparkbench$sparklaunch$SparkLaunch$$launch$1(SparkLaunch.scala:58)
        at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$$anonfun$launchJobs$2.apply(SparkLaunch.scala:65)
        at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$$anonfun$launchJobs$2.apply(SparkLaunch.scala:65)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$.launchJobs(SparkLaunch.scala:65)
        at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$.main(SparkLaunch.scala:38)
        at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch.main(SparkLaunch.scala)

[root@tug190-1 spark-bench_2.3.0_0.4.0-RELEASE]#

Description of your problem and any other relevant info

I'm getting an error that the input is not valid. I created the input file and the test file using the CONF file BELOW. I used 'parquet' because when I used 'cvs' I received an error message that I should use 'parquet' instead.

spark-bench = {
  spark-submit-config = [{
   spark-args = {
      master = "yarn" // FILL IN YOUR MASTER HERE
     // num-executors = 3
     // executor-memory = "XXXXXXX" // FILL IN YOUR EXECUTOR MEMORY
    }
    conf = {
      // Any configuration you need for your setup goes here, like:
      "spark.executor.cores" = "3"
      "spark.executor.memory" = "5g"
      "spark.driver.memory"   = "5g"
      // "spark.dynamicAllocation.enabled" = "false"
    }
    workload-suites = [
      {
        descr = "Generate a dataset"
#        benchmark-output = "hdfs:///tmp/km/results-data-gen.csv"
        workloads = [
          {
           name = "data-generation-lr"
//            rows = 10000000 // takes 1m to create
            rows = 50000000 // takes 8min to create 
            cols = 24
            output = "hdfs:///tmp/data/data-50M.parquet"
          }
        ]
      }
    ]
  }]
}

How should I generate proper dataset and run the logistic-regression test?

wtx626 commented 5 years ago

I hava got the same problem,is the head of data cause the

Caused by: java.lang.NumberFormatException: For input string: "�b���<0�.N����."