duritong / puppet-trocla

puppet integration for trocla
11 stars 18 forks source link

Investigate connection leaks in trocla-hiera-backend #25

Closed duritong closed 4 years ago

duritong commented 6 years ago

As reported on puppet-users, it seems that the hiera-backend is leaking stores:

https://groups.google.com/forum/#!topic/puppet-users/JDfXiA1xnCg

The workaround proposed on the list goes into the same direction as what we discussed in https://github.com/duritong/puppet-trocla/pull/18 and hence is introducing the same inefficiency (re-opening database connection), that I tried to avoid in the patches that followed to that PR.

Although, I explicitly tried to not run into the same problem as in the original puppet function with the hiera-backend, it looks like we are creating objects over and over again and hence keeping tons of connections open. Though that is my current guess and the little info we have on the report on the list, but it's the only explanation I have so far for the growing connection count.

What might interfere is the way how puppet is setting up the caches for hiera backends and eventually we might need to try to do that differently than storing the trocla object in the hash. But that's a pure guess atm, that needs more verification and analysis.

This is mainly a brain dump at the moment, to not loose track on what came up so far.

Poil commented 6 years ago

Hi,

On this node we have 2 call to trocla in the manifest We also have a call via hiera

files:
  /tmp/toto:
    content: "%{hiera('trocla_lookup::plain::mykey')}"

This resource is created via a create_resource in our main manifest

At each run, a new connection is opened and not closed. We have patched lib/puppet/functions/trocla_lookup_key.rb

diff --git a/lib/puppet/functions/trocla_lookup_key.rb b/lib/puppet/functions/trocla_lookup_key.rb
index d377ec8..f61df46 100644
--- a/lib/puppet/functions/trocla_lookup_key.rb
+++ b/lib/puppet/functions/trocla_lookup_key.rb
@@ -33,6 +33,8 @@ Puppet::Functions.create_function(:trocla_lookup_key) do
       trocla_hierarchy(trocla_key, format, opts)
     end

+    @trocla.close
+
     context.not_found unless res
     context.cache(key, res)
   end 

This seems to resolve this leak. We need to test more to see if there is no leak with the call in the manifests.

Also we use these gem

bcrypt (3.1.11 java)
fast_gettext (1.1.0)
gettext (3.2.2)
gettext-setup (0.26)
highline (1.7.8)
hocon (1.1.3)
jar-dependencies (0.2.6)
jdbc-mysql (5.1.40)
jruby-openssl (0.9.16 java)
json (1.8.0 java)
locale (2.1.2)
moneta (1.0.0)
rake (10.1.0)
rdoc (4.1.2)
semantic_puppet (0.1.3)
sequel (4.46.0)
text (1.3.1)
trocla (0.3.0)

And this config

---
encryption: :ssl
encryption_options:
  :private_key: /etc/puppetlabs/trocla/ssl/trocla_private_key.pem
  :public_key: /etc/puppetlabs/trocla/ssl/trocla_public_key.pem
options:
  expires: false
  length: 16
  random: true
profiles:
  sysdomain_nc:
    name_constraints:
    - xxx.xxx.xx.xx
store_options:
  adapter: :Sequel
  adapter_options:
    :db: jdbc:mysql://mysqlmaster-trocla.xxxx.xxx.xxx/trocla
    :password: xxxxxxxxxxxxxxxxx
    :port: 3306
    :table: trocla
    :user: trocla

Best regards

duritong commented 6 years ago

Great, thank you a lot for the information.

So indicating that it is one connection per run and you have one hiera-trocla call, I think the remaining leak is there.

So now I'm wondering why the cache is not working as expected, but for that I need to dig further into how it works.

I'm glad to hear the patch you did works, though I don't see it as a long-term solution, as a) it works around the actual problem and b) it makes trocla inefficent on high usage, so I really would like to find the actual problem and a way to properly fix it.

2 remaining questions:

Poil commented 6 years ago

Hi,

If needed I can do more test and patch our test server :)

Here is our big hiera configuration

---
version: 5
defaults:
  datadir: '/etc/puppetlabs/code/hieradata'
  data_hash: yaml_data

hierarchy:
# FQDN (ex: custproj-pw01.adm.fr)
  - name: Per fqdn data
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/fqdn/%{::fqdn}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Client Cert (ex: custproj-pw01.adm.fr)
  - name: Per clientcert data
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/clientcert/%{::trusted.certname}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role + SubRole + Platform (ex: webserver_prx_production)
  - name: Per role/subrole/platform
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/%{::subrole}/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role + Platform (ex: webserver_production)
  - name: Per role/platform
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/all/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Subrole + Platform (ex: production)
  - name: Per subrole/platform
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/all/%{::subrole}/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Platform (ex: production)
  - name: Per platform
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/all/all/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role + SubRole (ex: webserver_prx)
  - name: Per Role/Subrole
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/%{::subrole}/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role (ex: webserver)
  - name: Per Role
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/all/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Domain
  - name: Per Domain
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/other/%{::domain}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# AWS AZ
  - name: Per AWS AZ
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/other/%{::ec2_placement_availability_zone}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Project 
  - name: Per Project
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/all/all/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Client
  - name: Per Client
    glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Common
  - name: Per SOE/Location/AZ
    path: hieradata/%{::soe_version}/%{::location}/%{::ec2_placement_availability_zone}/common.yaml
  - name: Per SOE/Domain
    path: hieradata/%{::soe_version}/%{::domain}/common.yaml
  - name: Per SOE/Location
    path: hieradata/%{::soe_version}/%{::location}/common.yaml
  - name: Per SOE
    path: hieradata/%{::soe_version}/common.yaml
  - name: Per Location/AZ
    path: hieradata/%{::location}/%{::ec2_placement_availability_zone}/common.yaml
  - name: Per Location
    path: hieradata/%{::location}/common.yaml
  - name: Common
    path: hieradata/common.yaml
  - name: trocla
    lookup_key: trocla_lookup_key
    options:
      trocla_hierarchy:
        - defaults
      config: '/etc/puppetlabs/puppet/troclarc.yaml'
...
Poil commented 6 years ago

I've disabled max-requests-per-instance on puppetserver. After 18h I've got 60 opened connections to the trocla DB on each puppetserver.

duritong commented 6 years ago

I assume, you are still running one node against it? Every 30min.? Which would explain the 60 open connections, right?

Poil commented 6 years ago

No, it's on our production, we have 300 nodes connected on 4 puppetserver. This evening I have 80 connections.

Edit : Today (Sunday) I have 180 connections for each puppetserver

duritong commented 6 years ago

I see, so it's not leaking every run, but every once in a while. Do you still have only one trocla password being looked up through trocla-hiera?

Also how many environments do you have and how many code updates happened in that time?

Poil commented 6 years ago

On the production trocla is heavily use in our manifest, we generate the root password for unix, mysql, mongodb ... There is a minimum of 2 call per nodes and some nodes have more than 10 call to Trocla.

We only have 4 environments on our production.

At the moment we have 260 connections for each puppetserver.

We are going to apply this patch to check if it is these functions that have the leak

diff --git a/lib/puppet/parser/functions/trocla_set.rb b/lib/puppet/parser/functions/trocla_set.rb
index 06da5ae..57a3c8c 100644
--- a/lib/puppet/parser/functions/trocla_set.rb
+++ b/lib/puppet/parser/functions/trocla_set.rb
@@ -39,7 +39,7 @@ trocla, for example via cli.
       args = args[0]
     end

-    key = args[0] 
+    key = args[0]
     value = args[1]
     raise(Puppet::ParseError, "You need to pass at least key & value as an argument!") if key.nil? || value.nil?

@@ -56,8 +56,9 @@ trocla, for example via cli.
     result = (trocla=Trocla.new(configfile)).set_password(key,format,value)
     if format != return_format && (result = trocla.get_password(key,return_format)).nil?
       raise(Puppet::ParseError, "Plaintext password is not present, but required to return password in format #{return_format}") if (return_format == 'plain') || trocla.get_password(key,'plain').nil?
-      result = trocla.password(key,return_format,options) 
+      result = trocla.password(key,return_format,options)
     end
+    trocla.close
     result
   end
 end
diff --git a/lib/puppet/util/trocla_helper.rb b/lib/puppet/util/trocla_helper.rb
index ce583f5..05a2584 100644
--- a/lib/puppet/util/trocla_helper.rb
+++ b/lib/puppet/util/trocla_helper.rb
@@ -22,7 +22,9 @@ module Puppet::Util::TroclaHelper
           options = YAML.load(options)
         end

-        has_options ? store.send(trocla_func, key, format, options) : store.send(trocla_func, key, format)
+        r = has_options ? store.send(trocla_func, key, format, options) : store.send(trocla_func, key, format)
+        store.close
+        r
   end
   module_function :trocla
laugmanuel commented 4 years ago

I know it's three years later, but we just hit the same problem.

@Poil what's the current state on your side? Do you still use the patches mentioned above? Did you notice any performance degradation by explicitly closing the connections after each lookup?

fe80 commented 4 years ago

Hi @laugmanuel

This patch It's ok, we run with somthing like 7-8k trocla_lookup on hiera. No degradation with 4k6 nodes and 9 puppetserver

fe80 commented 4 years ago

image for connections stats

duritong commented 4 years ago

So would anyone like to propose an MR?

duritong commented 4 years ago

So I merged and if @laugmanuel could try and test and if reports are positive I would say we can close it (for now).

laugmanuel commented 4 years ago

@duritong as I wrote in the PR, i've applied the patch yesterday and still see the DB connection leaks. I will see if I can find the cause for us...

fe80 commented 4 years ago

Hi @laugmanuel

Which version of puppetserver you use ? After apply this patch, do you have restart your puppetserver ? The old code can still be load on jruby cache

fe80 commented 4 years ago

and which version of hiera ?

laugmanuel commented 4 years ago

Hi @fe80, we do use Puppetserver 6.13.0, Hiera 5 and the following gem versions:

moneta (1.2.1)
sequel (5.26.0)
trocla (0.3.0)

The Puppetserver got restarted multiple times after applying the patch.

I'm just wondering about the Jruby cache as we run multiple environments on the same Puppetserver. Only one env uses the Hiera Backend for Trocla, but the module is contained in every environment as they use the normal Trocla lookup. I've so far onlky patched the relevant environment which uses the hiera backend.

fe80 commented 4 years ago

If you have the code in multiple environment you need to apply the patch to all of them

laugmanuel commented 4 years ago

I've applied it to all our environments; I will keep an eye on it and report back tomorrow (after the change reaches production).

laugmanuel commented 4 years ago

I can confirm that after applying the patch to every environment on all of our Puppetservers, the connections get closed properly (purple line/blue area are connections to our trocladb): image

I think this Issue can be closed. Thanks for the support and sorry again for digging out this ancient problem 😉

duritong commented 4 years ago

Thank you for reporting back.