breser / git2consul

Mirrors the contents of a git repository into Consul KVs.
Other
764 stars 164 forks source link

Randomly deleting keys #126

Open guillaumeger opened 7 years ago

guillaumeger commented 7 years ago

It seems that git2consul is deleting keys when there is a problem with consul:

{"name":"git2consul","hostname":"git2consul-1489113718-3a5mk","pid":1,"level":40,"msg":"Failed to handle pull due to Error: consul: kv.get: rpc error: No cluster leader, clearing cache and retrying","time":"2016-11-23T23:56:35.436Z","v":0}

{"name":"git2consul","hostname":"git2consul-1489113718-3a5mk","pid":1,"level":40,"msg":"Purging branch cache /var/lib/git2consul_cache/lp/production for branch production in repo lp","time":"2016-11-23T23:56:35.436Z","v":0}

{"name":"git2consul","hostname":"git2consul-1489113718-3a5mk","pid":1,"level":10,"msg":"Deleting key lp/production/faq-bo","time":"2016-11-23T23:56:36.186Z","v":0}

{"name":"git2consul","hostname":"git2consul-1489113718-3a5mk","pid":1,"level":50,"msg":"Uncaught exception ReferenceError: key_name is not defined","time":"2016-11-23T23:56:36.234Z","v":0}

calvn commented 7 years ago

Did some quick visual stack tracking and problem seems to likely stem from Branch.prototype.handleRefChange() -> Branch.prototype.pull() -> getLastProcessedRef() -> consul.kv.get().

handleRefChange() should perform retries on consul-related error instead for purging the cache.

samifruit514 commented 7 years ago

Hello calvn,

I was taking a look at the code and I'm trying to understand how did you figure out that the error came from a consul.kv.get call....

From what I understand after reading guillaumeger's logs, the key_name is not defined would come from "consul.kv.del" in the file_deleted function (right after Deleting key logger trace)

Thanks a lot for the clues!

calvn commented 7 years ago

@samifruit514 you are right, it seems that the exception is thrown by consul.kv.del().

  1. The consul network experiences a hiccup during polling.
  2. git2consul triggers a cache purge.
  3. Local cache is emptied which triggers deletion on KVs.
  4. consul.kv.del() gets called but it's unable to perform the deletion.

Now, I wonder why it thinks key_name is undefined if the previous message logged key_name as "lp/production/faq-bo".

samifruit514 commented 7 years ago

Hello calvn,

We just noticed that we had this error message when running with 0.12.12 which has few paths where key_name is not defined (in lib/consul/index.js):

cb('Failed to delete key ' + key_name + ' due to ' + err);

These calls are located in handle_expanded_keys_with_different_file_types function.

Again, this is in 0.12.12. 0.12.13 seem to had fixed this.

I will try to reproduce this error in 0.12.12 to be sure this wont happend in 0.12.13 once we run it in our production environment.

Thanks again!

samifruit514 commented 7 years ago

hello @calvn ,

with 0.12.13, we have problems exactly like you described in your first reply in this thread. I made a PR that retries when the connection with consul fails.

Please let me know what you think about it https://github.com/Cimpress-MCP/git2consul/pull/143

Thanks a lot!