ExpediaGroup / circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
Apache License 2.0
86 stars 15 forks source link

NPE in S3MapReduceCpCopier on job failure #14

Closed massdosage closed 5 years ago

massdosage commented 6 years ago

When a Circus Train job using the S3MapReduceCpCopier fails it reports a NPE in the logs, this doesn't appear to impact the overall job result (which is a failure) but it's a distraction from whatever the real cause of failure is.

Here is an example log output showing the problem:

08-01-2018 04:39:34 PST lz_tvg_daily_guidance ERROR - AttemptID:attempt_1504697735424_23539_m_000002_1 Timed out after 600 secs 08-01-2018 04:39:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:39:35,086 INFO org.apache.hadoop.mapreduce.Job:1367 - map 88% reduce 0% 08-01-2018 04:41:48 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:41:48,320 INFO org.apache.hadoop.mapreduce.Job:1367 - map 89% reduce 0% 08-01-2018 04:42:34 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:42:34,400 INFO org.apache.hadoop.mapreduce.Job:1406 - Task Id : attempt_1504697735424_23539_m_000004_2, Statu s : FAILED 08-01-2018 04:42:34 PST lz_tvg_daily_guidance ERROR - AttemptID:attempt_1504697735424_23539_m_000004_2 Timed out after 600 secs 08-01-2018 04:42:34 PST lz_tvg_daily_guidance ERROR - Container killed by the ApplicationMaster. 08-01-2018 04:42:34 PST lz_tvg_daily_guidance ERROR - Container killed on request. Exit code is 143 08-01-2018 04:42:34 PST lz_tvg_daily_guidance ERROR - Container exited with a non-zero exit code 143 08-01-2018 04:42:34 PST lz_tvg_daily_guidance ERROR - 08-01-2018 04:42:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:42:35,404 INFO org.apache.hadoop.mapreduce.Job:1367 - map 86% reduce 0% 08-01-2018 04:45:26 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:45:26,697 INFO org.apache.hadoop.mapreduce.Job:1367 - map 87% reduce 0% 08-01-2018 04:47:09 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:47:09,868 INFO org.apache.hadoop.mapreduce.Job:1367 - map 88% reduce 0% 08-01-2018 04:48:22 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:48:22,999 INFO org.apache.hadoop.mapreduce.Job:1367 - map 89% reduce 0% 08-01-2018 04:49:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:49:35,122 INFO org.apache.hadoop.mapreduce.Job:1367 - map 90% reduce 0% 08-01-2018 04:50:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:50:35,226 INFO org.apache.hadoop.mapreduce.Job:1367 - map 91% reduce 0% 08-01-2018 04:54:04 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:54:04,598 INFO org.apache.hadoop.mapreduce.Job:1367 - map 92% reduce 0% 08-01-2018 04:56:37 PST lz_tvg_daily_guidance INFO - 2018-01-08 12:56:37,875 INFO org.apache.hadoop.mapreduce.Job:1367 - map 93% reduce 0% 08-01-2018 05:01:15 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:01:15,369 INFO org.apache.hadoop.mapreduce.Job:1367 - map 94% reduce 0% 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:03:35,650 INFO org.apache.hadoop.mapreduce.Job:1367 - map 100% reduce 0% 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:03:35,655 INFO org.apache.hadoop.mapreduce.Job:1380 - Job job_1504697735424_23539 failed with state FAILED du e to: Task failed task_1504697735424_23539_m_000000 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Job failed as tasks failed. failedMaps:1 failedReduces:0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:03:35,668 INFO org.apache.hadoop.mapreduce.Job:1385 - Counters: 35 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - File System Counters 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - FILE: Number of bytes read=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - FILE: Number of bytes written=4184453 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - FILE: Number of read operations=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - FILE: Number of large read operations=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - FILE: Number of write operations=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - HDFS: Number of bytes read=70199372730 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - HDFS: Number of bytes written=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - HDFS: Number of read operations=3140 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - HDFS: Number of large read operations=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - HDFS: Number of write operations=36 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Job Counters 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Failed map tasks=17 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Killed map tasks=2 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Launched map tasks=37 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Other local map tasks=37 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Total time spent by all maps in occupied slots (ms)=2424258760 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Total time spent by all reduces in occupied slots (ms)=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Total time spent by all map tasks (ms)=43290335 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Total vcore-milliseconds taken by all map tasks=43290335 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Total megabyte-milliseconds taken by all map tasks=77576280320 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Map-Reduce Framework 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Map input records=1458 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Map output records=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Input split bytes=2628 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Spilled Records=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Failed Shuffles=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Merged Map outputs=0 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - GC time elapsed (ms)=3518 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - CPU time spent (ms)=1872350 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Physical memory (bytes) snapshot=10116829184 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Virtual memory (bytes) snapshot=64700481536 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Total committed heap usage (bytes)=19780861952 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - com.hotels.bdp.circustrain.s3mapreducecp.mapreduce.Counter 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - BYTESCOPIED=70121513187 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - BYTESEXPECTED=70121513187 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - COPY=1458 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - File Input Format Counters 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Bytes Read=409236 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - File Output Format Counters 08-01-2018 05:03:35 PST lz_tvg_daily_guidance INFO - Bytes Written=0 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:03:41,329 INFO org.apache.hadoop.mapred.ClientServiceDelegate:278 - Application state is completed. FinalAppl icationStatus=FAILED. Redirecting to job history server 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:03:41,348 ERROR com.hotels.bdp.circustrain.core.event.CompositeCopierListener:54 - Listener 'com.hotels.bdp.ci rcustrain.core.event.MetricsListener@42d6aa88' threw exception onFailure. 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - java.lang.NullPointerException 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier$1.getValue(S3MapReduceCpCopier.java:157) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier$1.getValue(S3MapReduceCpCopier.java:153) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.shaded.com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.shaded.com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:171) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.shaded.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.event.MetricsListener.copierEnd(MetricsListener.java:83) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.event.CompositeCopierListener.copierEnd(CompositeCopierListener.java:52) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.PartitionedTableReplication.replicate(PartitionedTableReplication.java:123) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.Locomotive.run(Locomotive.java:114) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:797) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:787) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:777) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.run(SpringApplication.java:308) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:87) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at java.lang.reflect.Method.invoke(Method.java:498) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - 2018-01-08 13:03:41,348 ERROR com.hotels.bdp.circustrain.core.Locomotive:121 - Failed to replicate: source-cluster:db.source to aws-dest-cluster:db.dest 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - com.hotels.bdp.circustrain.api.CircusTrainException: Unable to replicate 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.PartitionedTableReplication.replicate(PartitionedTableReplication.java:136) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.Locomotive.run(Locomotive.java:114) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:797) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:787) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:777) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.springframework.boot.SpringApplication.run(SpringApplication.java:308) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:87) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at java.lang.reflect.Method.invoke(Method.java:498) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - Caused by: com.hotels.bdp.circustrain.api.CircusTrainException: Unable to copy file(s) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier.copy(S3MapReduceCpCopier.java:147) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.core.PartitionedTableReplication.replicate(PartitionedTableReplication.java:121) 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - ... 12 more 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - Caused by: java.io.IOException: S3MapReduceCp failure: Job job_1504697735424_23539 has failed: Task failed task_150469773542 4_23539_m_000000 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - Job failed as tasks failed. failedMaps:1 failedReduces:0 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - 08-01-2018 05:03:41 PST lz_tvg_daily_guidance INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier.copy(S3MapReduceCpCopier.java:140)

and another

08-01-2018 23:40:01 PST etl_hcom_hit_data_cleansed ERROR - AttemptID:attempt_1504697735424_23693_m_000012_2 Timed out after 600 secs 08-01-2018 23:40:02 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:40:02,412 INFO org.apache.hadoop.mapreduce.Job:1367 - map 92% reduce 0% 08-01-2018 23:40:29 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:40:29,462 INFO org.apache.hadoop.mapreduce.Job:1367 - map 93% reduce 0% 08-01-2018 23:42:47 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:42:47,699 INFO org.apache.hadoop.mapreduce.Job:1367 - map 94% reduce 0% 08-01-2018 23:44:46 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:44:46,904 INFO org.apache.hadoop.mapreduce.Job:1367 - map 95% reduce 0% 08-01-2018 23:49:10 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:49:10,357 INFO org.apache.hadoop.mapreduce.Job:1367 - map 96% reduce 0% 08-01-2018 23:52:31 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:52:31,703 INFO org.apache.hadoop.mapreduce.Job:1367 - map 97% reduce 0% 08-01-2018 23:55:41 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 07:55:41,026 INFO org.apache.hadoop.mapreduce.Job:1367 - map 98% reduce 0% 09-01-2018 00:02:32 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:02:32,797 INFO org.apache.hadoop.mapreduce.Job:1367 - map 99% reduce 0% 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:02,811 INFO org.apache.hadoop.mapreduce.Job:1367 - map 100% reduce 0% 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:02,815 INFO org.apache.hadoop.mapreduce.Job:1380 - Job job_1504697735424_23693 failed with state FAIL ED due to: Task failed task_1504697735424_23693_m_000012 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Job failed as tasks failed. failedMaps:1 failedReduces:0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:02,829 INFO org.apache.hadoop.mapreduce.Job:1385 - Counters: 34 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - File System Counters 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - FILE: Number of bytes read=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - FILE: Number of bytes written=4649770 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - FILE: Number of read operations=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - FILE: Number of large read operations=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - FILE: Number of write operations=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - HDFS: Number of bytes read=472789110973 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - HDFS: Number of bytes written=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - HDFS: Number of read operations=6015 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - HDFS: Number of large read operations=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - HDFS: Number of write operations=40 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Job Counters 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Failed map tasks=15 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Launched map tasks=35 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Other local map tasks=35 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Total time spent by all maps in occupied slots (ms)=4458162016 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Total time spent by all reduces in occupied slots (ms)=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Total time spent by all map tasks (ms)=79610036 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Total vcore-milliseconds taken by all map tasks=79610036 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Total megabyte-milliseconds taken by all map tasks=142661184512 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Map-Reduce Framework 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Map input records=2868 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Map output records=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Input split bytes=2880 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Spilled Records=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Failed Shuffles=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Merged Map outputs=0 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - GC time elapsed (ms)=11623 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - CPU time spent (ms)=11317780 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Physical memory (bytes) snapshot=9264287744 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Virtual memory (bytes) snapshot=72171884544 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Total committed heap usage (bytes)=17369661440 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - com.hotels.bdp.circustrain.s3mapreducecp.mapreduce.Counter 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - BYTESCOPIED=471995148020 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - BYTESEXPECTED=471995148020 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - COPY=2868 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - File Input Format Counters 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Bytes Read=611761 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - File Output Format Counters 09-01-2018 00:21:02 PST etl_hcom_hit_data_cleansed INFO - Bytes Written=0 09-01-2018 00:21:13 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:13,971 INFO org.apache.hadoop.ipc.Client:871 - Retrying connect to server: FOO. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 09-01-2018 00:21:14 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:14,972 INFO org.apache.hadoop.ipc.Client:871 - Retrying connect to server:FOO. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 09-01-2018 00:21:15 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:15,973 INFO org.apache.hadoop.ipc.Client:871 - Retrying connect to server:FOO. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:16,078 INFO org.apache.hadoop.mapred.ClientServiceDelegate:278 - Application state is completed. Fina lApplicationStatus=FAILED. Redirecting to job history server 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:16,092 ERROR com.hotels.bdp.circustrain.core.event.CompositeCopierListener:54 - Listener 'com.hotels.b dp.circustrain.core.event.MetricsListener@412de677' threw exception onFailure. 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - java.lang.NullPointerException 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier$1.getValue(S3MapReduceCpCopier.java:157) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier$1.getValue(S3MapReduceCpCopier.java:153) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.shaded.com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.shaded.com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:171) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.shaded.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.event.MetricsListener.copierEnd(MetricsListener.java:83) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.event.CompositeCopierListener.copierEnd(CompositeCopierListener.java:52) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.PartitionedTableReplication.replicate(PartitionedTableReplication.java:123) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.Locomotive.run(Locomotive.java:114) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:797) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:787) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:777) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.run(SpringApplication.java:308) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:87) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at java.lang.reflect.Method.invoke(Method.java:498) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - 2018-01-09 08:21:16,093 ERROR com.hotels.bdp.circustrain.core.Locomotive:121 - Failed to replicate: chandler-prod:etlda ta.etl_hcom_hit_data_cleansed to aws-hcom-data-warehouse-common:bix_clickstream.etl_hcom_hit_data_cleansed. 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - com.hotels.bdp.circustrain.api.CircusTrainException: Unable to replicate 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.PartitionedTableReplication.replicate(PartitionedTableReplication.java:136) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.Locomotive.run(Locomotive.java:114) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:797) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:787) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:777) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.springframework.boot.SpringApplication.run(SpringApplication.java:308) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:87) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at java.lang.reflect.Method.invoke(Method.java:498) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - Caused by: com.hotels.bdp.circustrain.api.CircusTrainException: Unable to copy file(s) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopier.copy(S3MapReduceCpCopier.java:147) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - at com.hotels.bdp.circustrain.core.PartitionedTableReplication.replicate(PartitionedTableReplication.java:121) 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - ... 12 more 09-01-2018 00:21:16 PST etl_hcom_hit_data_cleansed INFO - Caused by: java.io.IOException: S3MapReduceCp failure: Job job_1504697735424_23693 has failed: Task failed task_1504697 735424_23693_m_000012

massdosage commented 5 years ago

It looks like something is going wrong reporting a value for a metric when a failure has occurred, possibly it has no value and is null which triggers some of NPE. We haven't had any end users report this since this was originally raised and a lot of this code has moved around recently. Closing for now, will re-open if reported again.