bigdatagenomics / avocado

A Variant Caller, Distributed. Apache 2 licensed.
http://bdgenomics.org/projects/avocado/
Apache License 2.0
71 stars 42 forks source link

DiscoverVariants runtime has big difference between repeated run with the same args. #231

Open xubo245 opened 7 years ago

xubo245 commented 7 years ago

I use DiscoverVariants(org.bdgenomics.avocado.cli.DiscoverVariants) to discover Variant, the data is 8 million PE reads (by wgsim)

But the runtime has big difference between repeated run with the same args.

I try many time.

code:

    val startTime = System.currentTimeMillis()
    var sam = args(0)
    var out = args(1)
    var appArgs = "sam:" + sam + "\tout:" + out
    val sc = new SparkContext(conf)
    val ac = new ADAMContext(sc)
    DiscoverVariants(Array(sam, out)).run(sc)
    sc.stop
    val stopTime = System.currentTimeMillis()
    println(appArgs + "\ttime:\t" + (stopTime - startTime) / 1000.0 + "\t")

time:

hadoop@Master:~/disk2/xubo/project/callVariant/GCDSS$ tail -f discoverVariantDiffNumTesttime201704152123.txt 
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI6.adam  time:   902.211 
Apr 15, 2017 9:24:03 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 9:38:57 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI7.adam  time:   325.683 
Apr 15, 2017 9:39:08 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 9:44:26 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
fnothaft commented 7 years ago

Hi @xubo245 !

Sorry for the slow reply; I hadn't seen this issue when it came in! Are you running this locally? If you run 5 times, what do the runtimes look like? It is possible that you're seeing a warmup phenomena (e.g., file system buffering).

xubo245 commented 7 years ago

I run avocado in cluster with Spark standalone.

run 5 time:

am:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1DiscoverVariantI1.adam      time:   308.092
Apr 15, 2017 9:24:20 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 9:29:19 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1DiscoverVariantI2.adam      time:   295.197
Apr 15, 2017 9:29:30 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 9:34:17 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1DiscoverVariantI3.adam      time:   296.947
Apr 15, 2017 9:34:29 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 9:39:17 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1DiscoverVariantI4.adam      time:   301.25
Apr 15, 2017 9:39:29 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 9:44:22 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c16000000Nhs20Paired12time1000num32k1DiscoverVariantI5.adam      time:   935.559
Apr 15, 2017 9:44:33 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 10:00:01 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI1.adam      time:   1019.173
Apr 15, 2017 10:00:12 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 10:17:03 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI2.adam      time:   1043.837
Apr 15, 2017 10:17:14 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 10:34:30 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI3.adam      time:   331.459
Apr 15, 2017 10:34:41 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 10:40:04 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI4.adam      time:   1034.815
Apr 15, 2017 10:40:16 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 10:57:23 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI5.adam      time:   1034.985
Apr 15, 2017 10:57:34 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 11:14:41 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI1.adam      time:   1142.983
Apr 15, 2017 11:14:53 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 11:33:47 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI2.adam      time:   1148.81
Apr 15, 2017 11:33:59 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 11:52:59 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI3.adam      time:   1161.003
Apr 15, 2017 11:53:10 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 12:12:23 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI4.adam      time:   1122.803
Apr 15, 2017 12:12:35 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 15, 2017 12:31:09 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam       out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI5.adam      time:   1139.485
Apr 15, 2017 12:31:21 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
xubo245 commented 7 years ago

I run 20 times:

sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI11.adam time:   611.29  
Apr 16, 2017 12:07:07 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 12:17:09 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI12.adam time:   1041.574    
Apr 16, 2017 12:17:21 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 12:34:34 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI13.adam time:   1046.914    
Apr 16, 2017 12:34:45 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 12:52:04 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI14.adam time:   978.584 
Apr 16, 2017 12:52:16 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 1:08:26 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI15.adam time:   1017.985    
Apr 16, 2017 1:08:37 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 1:25:25 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI16.adam time:   1037.953    
Apr 16, 2017 1:25:38 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 1:42:47 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI17.adam time:   1026.294    
Apr 16, 2017 1:42:59 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 1:59:57 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI18.adam time:   1000.112    
Apr 16, 2017 2:00:08 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 2:16:40 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI19.adam time:   1031.265    
Apr 16, 2017 2:16:52 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 2:33:54 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI20.adam time:   1033.526    
Apr 16, 2017 2:34:05 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 2:51:10 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI21.adam time:   335.346 
Apr 16, 2017 2:51:21 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 2:56:48 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI22.adam time:   333.994 
Apr 16, 2017 2:57:00 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 3:02:25 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI23.adam time:   1011.96 
Apr 16, 2017 3:02:37 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 3:19:20 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI24.adam time:   1006.177    
Apr 16, 2017 3:19:32 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 3:36:10 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI25.adam time:   1038.076    
Apr 16, 2017 3:36:22 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 3:53:31 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI26.adam time:   1030.243    
Apr 16, 2017 3:53:42 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 4:10:44 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI27.adam time:   1033.402    
Apr 16, 2017 4:10:56 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 4:28:01 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI28.adam time:   1017.483    
Apr 16, 2017 4:28:13 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 4:45:02 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI29.adam time:   1007.373    
Apr 16, 2017 4:45:14 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 5:01:53 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c18000000Nhs20Paired12time1000num32k1DiscoverVariantI30.adam time:   902.883 
Apr 16, 2017 5:02:04 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 5:16:59 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI11.adam time:   1116.18 
Apr 16, 2017 5:17:11 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 5:35:38 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI12.adam time:   1086.454    
Apr 16, 2017 5:35:49 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 5:53:47 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI13.adam time:   1109.689    
Apr 16, 2017 5:53:59 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 6:12:20 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI14.adam time:   1130.608    
Apr 16, 2017 6:12:31 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 6:31:14 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI15.adam time:   1146.735    
Apr 16, 2017 6:31:25 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 6:50:23 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI16.adam time:   1141.368    
Apr 16, 2017 6:50:35 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 7:09:28 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI17.adam time:   1136.241    
Apr 16, 2017 7:09:39 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 7:28:27 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI18.adam time:   1144.389    
Apr 16, 2017 7:28:38 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 7:47:34 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI19.adam time:   1138.622    
Apr 16, 2017 7:47:46 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 8:06:36 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI20.adam time:   1119.333    
Apr 16, 2017 8:06:48 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 8:25:18 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI21.adam time:   360.353 
Apr 16, 2017 8:25:30 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 8:31:22 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI22.adam time:   1101.976    
Apr 16, 2017 8:31:34 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 8:49:47 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI23.adam time:   1183.02 
Apr 16, 2017 8:49:58 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 9:09:34 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI24.adam time:   1088.011    
Apr 16, 2017 9:09:45 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 9:27:45 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI25.adam time:   1115.471    
Apr 16, 2017 9:27:56 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 9:46:23 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI26.adam time:   1134.819    
Apr 16, 2017 9:46:34 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 10:05:21 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI27.adam time:   1127.239    
Apr 16, 2017 10:05:32 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 10:24:11 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI28.adam time:   1122.376    
Apr 16, 2017 10:24:23 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 10:42:56 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI29.adam time:   369.243 
Apr 16, 2017 10:43:08 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 10:49:09 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
sam:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1.adam   out:/xubo/project/alignment/CloudBWA/g38/time/cloudBWAnewg38L50c20000000Nhs20Paired12time1000num32k1DiscoverVariantI30.adam time:   1133.132    
Apr 16, 2017 10:49:20 AM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 32
Apr 16, 2017 11:08:05 AM INFO: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5
fnothaft commented 7 years ago

Oh yeah, those times are really all over the map! Do you have the job history server enabled on your Spark cluster? If so, I am wondering if you have any failed tasks during the slower runs?