Open digitalrane opened 5 years ago
This also happens without the '1.9' version set. The problem appears to be that 'defaults' is not defined, given that the config at that point has no default section. Additional logic might be required to detect and correct. The haproxy configuration at this point of configuration only has the stats and default port 80 frontends, and the redirect backend.
frontend stats
bind 0.0.0.0:9000
stats enable
stats auth admin:admin
stats uri /ha-stats
acl local src 10.0.0.0/8 192.168.0.0/16 127.0.0.0/8
http-request deny if !local
frontend relation-80
bind 0.0.0.0:80
default_backend redirect
backend redirect
redirect scheme https
server redirect 127.0.0.1:0
I've also just noticed now that we're missing 172.x networks from our local ACL.
This is an example where parsing application config in a charm has caused problems whereas templating the entire config would have avoided this issue, by the way :stuck_out_tongue:
I'm digging into this and with no defaults section defined, there is no clear way to coax pyhaproxy into creating one. I'm going to keep poking at it.
I'm installing it now to see, it's been a while, but my recollection is that the default section is created when HAProxy is installed and the charm only appends 1 setting to it otherwise accepting the package defaults.
So I'm curious, was the file modified after install, or was 1.9 set during install and then switched back to cause the error (that's not supported)? Understanding what's causing this will help inform how to fix it. While we can obviously write a default section I'm curious to know why it's missing in the first place.
I've just run the test for 0.1.5 and it passes xenial/bionic. Do you have a bundle that reproduces this?
The following bundle reproduces this issue for me, with either 1.9 or 1.7 set. I'm not able to reproduce this on a local LXD provider, but when I deployed on a server using bridged networking (which I mention just for completeness' sake) I got the above backtrace.
series: bionic
applications:
haproxy:
charm: cs:~pirate-charmers/haproxy
num_units: 1
series: xenial
expose: true
options:
version: '1.9'
My process was as such -
The defaults section looks like this after deploy:
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
timeout tunnel 1h
I think I wrote the title of the PR thinking I fully grokked the problem, but you're right, there are many options that come from the default config file being installed rather than the charm configure hook.
I think in retrospect, what is happening, and what this addresses (and what I initially intended it to address, too) was adding needed safety around accessing the first index of the defaults configuration without testing if the defaults section actually had anything in it. Looking at this more closely, I think there is more likely a race condition that means we are parsing the config before there is a defaults section, or we're trying to access the defaults section before the config has been parsed.
I agree that clobbering the config is the wrong approach, but I'm not 100% sure that's what is actually happening with this approach. We definitely don't want to clobber the config, but I also don't want to throw errors and fail the hook if we can handle it gracefully, which this code appears to from my testing.
EDIT: removed my local charm hack because my charmstore access is still busted :-1:
If you're talking about the fix in #9 I'm curious, when you use that do you still get the full defaults section? If this is a race, when you hit that error you've read in a file with no defaults and will write back over top of it. Whatever was suppose to create that section is likely to then come along and either overwrite the one we've configured or bail b/c it didn't expect to find a file there.
This must be a race as you describe since it's changing based on where it's deployed.
The defaults section I provided above was from a charm which was built to include #9 as the only delta from the current master. So, it fixed the issue and didn't seem to create any issues for me. I think potentially it's unsafe accessing the [0]th index of the default section when the config file might not yet exist, or the section might not be there, and I believe as soon as the config file does come in, things work as expected.
I agree, accessing it before the file exists is an issue. Accessing the '0' default should be safe if it's not we're in an unexpected state. I think we should find the race condition because I can't explain how you would both hit this bug and get all the defaults. I'm concerned when re-testing you're simply not hitting the race condition, can you put in a log so we can confirm it's exercising the check?
I actually think the first access is probably a no-op, because all it's checking for is a tunnel timeout, and then configuring based on that. Even if we just skipped the tunnel timeout part of configuration if defaults wasn't defined, that would avoid this behaviour, which is basically what's happening now from what I can tell. I'm not familiar with the flag logic in use here enough to say why this might be running before the package is installed (and therefore the config is in place). I'll try reproduce, and see if I can also use one of the apt flags to guard this function from running before the installs have finished or something?
Actually, tunnel_timeout is not part of the default config, this is the one option that the charm hard-codes to on. At least looking through the code, this is already protected by haproxy.installed flag. Is it possible fetch.install is non-blocking such that the installed flag is set but the install is not yet complete?
Maybe using the apt-layer would solve this, the more I look at this the more convinced I am that the fix should stop the error but clobber the package defaults. I just can't figure out how fetch could install the package and the the config file not be written by the time the next hook is run and tunnel_timeout is run.
I've got an assumption here that's wrong somewhere because I can't see how this is triggering.
Haproxy charm, in a fresh deployment, relying solely on default configuration besides setting the 'version' to '1.9' (tested previously, and this worked) backtraced on install -
The unit currently has no relations - so I'm wondering if initial deployment with no relations is a problem after the work we've done to support multiple ports or similar. Needs further debugging.