Closed pmangg closed 8 years ago
Is there any way we can restart Camus from that day to backfill the missing data? It is used by reportify-checkout-events-view
and the checkout fact tables (home cards, tableau, etc).
So you are only missing data for partition 12? I Will make a Back-up of that data right now As we are close to the 7 day buffer that's kept on the brokers.
Sent from my iPhone
On Dec 2, 2015, at 20:31, Putra Manggala notifications@github.com wrote:
We're missing some checkout data in HDFS (Shopify/starscream#8162) and it looks to be shop-specific, i.e., a set of shops stopped having checkout kafka data since the 26th. Digging in further, this data is in Kafka but it's just not dropped to HDFS. In a Camus log, I see:
02-12-2015 15:30:29 EST Camus INFO - [CamusJob] - Offset range from kafka metadata is outside the previously persisted offset, checkout uri:tcp://kafka08.chi.shopify.com:9092 leader:8 partition:12 earliest_offset:240498314 offset:346216743 latest_offset:332230367 avg_msg_size:1343 estimated_size:-18783702968 02-12-2015 15:30:29 EST Camus INFO - Topic checkout will be skipped. 02-12-2015 15:30:29 EST Camus INFO - Please check whether kafka cluster configuration is correct. You can also specify config parameter: kafka.move.to.earliest.offset to start processing from earliest kafka metadata offset. The first Camus run where an instance of this log started occuring is in https://azkaban.data.shopify.com/executor?execid=149416&job=Camus and we stopped getting checkout kafka data in HDFS for that set of shops from then onwards.
cc @Shopify/data-acquisition @angelini
— Reply to this email directly or view it on GitHub.
That Topic checkout will be skipped
message started happening in https://azkaban.data.shopify.com/executor?execid=149416&job=Camus for a bunch of partitions, not just partition 12, however, in the latest run, only partition 12 has that message. Looking at the shops in partition all these partitions (the key for the checkout
topic is shop_id
), only shops from partition 12 are missing data from that latest run on the 25th.
Should be fixed by https://github.com/Shopify/cookbooks/pull/9245
We're missing some checkout data in HDFS (https://github.com/Shopify/starscream/issues/8162) and it looks to be shop-specific, i.e., a set of shops stopped having checkout kafka data since the 26th. Digging in further, this data is in Kafka but it's just not dropped to HDFS. In a Camus log, I see:
The first Camus run where an instance of this log started occuring is in https://azkaban.data.shopify.com/executor?execid=149416&job=Camus and we stopped getting checkout kafka data in HDFS for that set of shops from then onwards.
cc @Shopify/data-acquisition @angelini