Closed duritong closed 4 years ago
Hi,
On this node we have 2 call to trocla in the manifest We also have a call via hiera
files:
/tmp/toto:
content: "%{hiera('trocla_lookup::plain::mykey')}"
This resource is created via a create_resource in our main manifest
At each run, a new connection is opened and not closed.
We have patched lib/puppet/functions/trocla_lookup_key.rb
diff --git a/lib/puppet/functions/trocla_lookup_key.rb b/lib/puppet/functions/trocla_lookup_key.rb
index d377ec8..f61df46 100644
--- a/lib/puppet/functions/trocla_lookup_key.rb
+++ b/lib/puppet/functions/trocla_lookup_key.rb
@@ -33,6 +33,8 @@ Puppet::Functions.create_function(:trocla_lookup_key) do
trocla_hierarchy(trocla_key, format, opts)
end
+ @trocla.close
+
context.not_found unless res
context.cache(key, res)
end
This seems to resolve this leak. We need to test more to see if there is no leak with the call in the manifests.
Also we use these gem
bcrypt (3.1.11 java)
fast_gettext (1.1.0)
gettext (3.2.2)
gettext-setup (0.26)
highline (1.7.8)
hocon (1.1.3)
jar-dependencies (0.2.6)
jdbc-mysql (5.1.40)
jruby-openssl (0.9.16 java)
json (1.8.0 java)
locale (2.1.2)
moneta (1.0.0)
rake (10.1.0)
rdoc (4.1.2)
semantic_puppet (0.1.3)
sequel (4.46.0)
text (1.3.1)
trocla (0.3.0)
And this config
---
encryption: :ssl
encryption_options:
:private_key: /etc/puppetlabs/trocla/ssl/trocla_private_key.pem
:public_key: /etc/puppetlabs/trocla/ssl/trocla_public_key.pem
options:
expires: false
length: 16
random: true
profiles:
sysdomain_nc:
name_constraints:
- xxx.xxx.xx.xx
store_options:
adapter: :Sequel
adapter_options:
:db: jdbc:mysql://mysqlmaster-trocla.xxxx.xxx.xxx/trocla
:password: xxxxxxxxxxxxxxxxx
:port: 3306
:table: trocla
:user: trocla
Best regards
Great, thank you a lot for the information.
So indicating that it is one connection per run and you have one hiera-trocla call, I think the remaining leak is there.
So now I'm wondering why the cache is not working as expected, but for that I need to dig further into how it works.
I'm glad to hear the patch you did works, though I don't see it as a long-term solution, as a) it works around the actual problem and b) it makes trocla inefficent on high usage, so I really would like to find the actual problem and a way to properly fix it.
2 remaining questions:
Hi,
If needed I can do more test and patch our test server :)
Here is our big hiera configuration
---
version: 5
defaults:
datadir: '/etc/puppetlabs/code/hieradata'
data_hash: yaml_data
hierarchy:
# FQDN (ex: custproj-pw01.adm.fr)
- name: Per fqdn data
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/fqdn/%{::fqdn}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Client Cert (ex: custproj-pw01.adm.fr)
- name: Per clientcert data
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/clientcert/%{::trusted.certname}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role + SubRole + Platform (ex: webserver_prx_production)
- name: Per role/subrole/platform
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/%{::subrole}/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role + Platform (ex: webserver_production)
- name: Per role/platform
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/all/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Subrole + Platform (ex: production)
- name: Per subrole/platform
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/all/%{::subrole}/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Platform (ex: production)
- name: Per platform
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/all/all/%{::platform}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role + SubRole (ex: webserver_prx)
- name: Per Role/Subrole
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/%{::subrole}/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Role (ex: webserver)
- name: Per Role
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/%{::role}/all/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Domain
- name: Per Domain
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/other/%{::domain}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# AWS AZ
- name: Per AWS AZ
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/other/%{::ec2_placement_availability_zone}/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Project
- name: Per Project
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/%{::project_code}/claranet/all/all/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Client
- name: Per Client
glob: hieradata-nodes-configuration/%{::kernel}/%{::client_code}/all/{classes,params,users,groups,files,packages,ssh_authorized_keys,exec,common}.yaml
# Common
- name: Per SOE/Location/AZ
path: hieradata/%{::soe_version}/%{::location}/%{::ec2_placement_availability_zone}/common.yaml
- name: Per SOE/Domain
path: hieradata/%{::soe_version}/%{::domain}/common.yaml
- name: Per SOE/Location
path: hieradata/%{::soe_version}/%{::location}/common.yaml
- name: Per SOE
path: hieradata/%{::soe_version}/common.yaml
- name: Per Location/AZ
path: hieradata/%{::location}/%{::ec2_placement_availability_zone}/common.yaml
- name: Per Location
path: hieradata/%{::location}/common.yaml
- name: Common
path: hieradata/common.yaml
- name: trocla
lookup_key: trocla_lookup_key
options:
trocla_hierarchy:
- defaults
config: '/etc/puppetlabs/puppet/troclarc.yaml'
...
I've disabled max-requests-per-instance on puppetserver. After 18h I've got 60 opened connections to the trocla DB on each puppetserver.
I assume, you are still running one node against it? Every 30min.? Which would explain the 60 open connections, right?
No, it's on our production, we have 300 nodes connected on 4 puppetserver. This evening I have 80 connections.
Edit : Today (Sunday) I have 180 connections for each puppetserver
I see, so it's not leaking every run, but every once in a while. Do you still have only one trocla password being looked up through trocla-hiera?
Also how many environments do you have and how many code updates happened in that time?
On the production trocla is heavily use in our manifest, we generate the root password for unix, mysql, mongodb ... There is a minimum of 2 call per nodes and some nodes have more than 10 call to Trocla.
We only have 4 environments on our production.
At the moment we have 260 connections for each puppetserver.
We are going to apply this patch to check if it is these functions that have the leak
diff --git a/lib/puppet/parser/functions/trocla_set.rb b/lib/puppet/parser/functions/trocla_set.rb
index 06da5ae..57a3c8c 100644
--- a/lib/puppet/parser/functions/trocla_set.rb
+++ b/lib/puppet/parser/functions/trocla_set.rb
@@ -39,7 +39,7 @@ trocla, for example via cli.
args = args[0]
end
- key = args[0]
+ key = args[0]
value = args[1]
raise(Puppet::ParseError, "You need to pass at least key & value as an argument!") if key.nil? || value.nil?
@@ -56,8 +56,9 @@ trocla, for example via cli.
result = (trocla=Trocla.new(configfile)).set_password(key,format,value)
if format != return_format && (result = trocla.get_password(key,return_format)).nil?
raise(Puppet::ParseError, "Plaintext password is not present, but required to return password in format #{return_format}") if (return_format == 'plain') || trocla.get_password(key,'plain').nil?
- result = trocla.password(key,return_format,options)
+ result = trocla.password(key,return_format,options)
end
+ trocla.close
result
end
end
diff --git a/lib/puppet/util/trocla_helper.rb b/lib/puppet/util/trocla_helper.rb
index ce583f5..05a2584 100644
--- a/lib/puppet/util/trocla_helper.rb
+++ b/lib/puppet/util/trocla_helper.rb
@@ -22,7 +22,9 @@ module Puppet::Util::TroclaHelper
options = YAML.load(options)
end
- has_options ? store.send(trocla_func, key, format, options) : store.send(trocla_func, key, format)
+ r = has_options ? store.send(trocla_func, key, format, options) : store.send(trocla_func, key, format)
+ store.close
+ r
end
module_function :trocla
I know it's three years later, but we just hit the same problem.
@Poil what's the current state on your side? Do you still use the patches mentioned above? Did you notice any performance degradation by explicitly closing the connections after each lookup?
Hi @laugmanuel
This patch It's ok, we run with somthing like 7-8k trocla_lookup on hiera. No degradation with 4k6 nodes and 9 puppetserver
for connections stats
So would anyone like to propose an MR?
So I merged and if @laugmanuel could try and test and if reports are positive I would say we can close it (for now).
@duritong as I wrote in the PR, i've applied the patch yesterday and still see the DB connection leaks. I will see if I can find the cause for us...
Hi @laugmanuel
Which version of puppetserver you use ? After apply this patch, do you have restart your puppetserver ? The old code can still be load on jruby cache
and which version of hiera ?
Hi @fe80, we do use Puppetserver 6.13.0, Hiera 5 and the following gem versions:
moneta (1.2.1)
sequel (5.26.0)
trocla (0.3.0)
The Puppetserver got restarted multiple times after applying the patch.
I'm just wondering about the Jruby cache as we run multiple environments on the same Puppetserver. Only one env uses the Hiera Backend for Trocla, but the module is contained in every environment as they use the normal Trocla lookup. I've so far onlky patched the relevant environment which uses the hiera backend.
If you have the code in multiple environment you need to apply the patch to all of them
I've applied it to all our environments; I will keep an eye on it and report back tomorrow (after the change reaches production).
I can confirm that after applying the patch to every environment on all of our Puppetservers, the connections get closed properly (purple line/blue area are connections to our trocladb):
I think this Issue can be closed. Thanks for the support and sorry again for digging out this ancient problem 😉
Thank you for reporting back.
As reported on puppet-users, it seems that the hiera-backend is leaking stores:
https://groups.google.com/forum/#!topic/puppet-users/JDfXiA1xnCg
The workaround proposed on the list goes into the same direction as what we discussed in https://github.com/duritong/puppet-trocla/pull/18 and hence is introducing the same inefficiency (re-opening database connection), that I tried to avoid in the patches that followed to that PR.
Although, I explicitly tried to not run into the same problem as in the original puppet function with the hiera-backend, it looks like we are creating objects over and over again and hence keeping tons of connections open. Though that is my current guess and the little info we have on the report on the list, but it's the only explanation I have so far for the growing connection count.
What might interfere is the way how puppet is setting up the caches for hiera backends and eventually we might need to try to do that differently than storing the trocla object in the hash. But that's a pure guess atm, that needs more verification and analysis.
This is mainly a brain dump at the moment, to not loose track on what came up so far.