Closed coderbydesign closed 3 months ago
LGTM, @eguzki need your feedback as well
Thanks @coderbydesign for the fix.
Your fix is indeed right. I overlooked completely the use case for APICAST_CONFIGURATION_CACHE=-1
.
I want to propose an alternative fix, more readable IMHO:
diff --git a/gateway/src/apicast/configuration_loader.lua b/gateway/src/apicast/configuration_loader.lua
index f1eff77e..ec128fe9 100644
--- a/gateway/src/apicast/configuration_loader.lua
+++ b/gateway/src/apicast/configuration_loader.lua
@@ -204,22 +204,15 @@ function boot.init_worker(configuration)
schedule(interval, handler, ...)
end
- -- Check whether the reserved boot configuration is fresh or stale.
- -- If it is stale, refresh configuration
- -- When a worker process is (re-)spawned,
- -- it will start working with fresh (according the ttl semantics) configuration
- local boot_reserved_hosts = configuration:find_by_host(boot_reserved_domain, false)
- if(#boot_reserved_hosts == 0)
- then
- -- the boot configuration has expired, load fresh config
- ngx.log(ngx.INFO, 'boot time configuration has expired')
- -- ngx.socket.tcp is not available at the init or init_worker phases,
- -- it needs to be scheduled (with delay = 0)
- schedule(0, handler, configuration)
- elseif(interval > 0)
- then
- ngx.log(ngx.DEBUG, 'schedule new configuration loading')
- schedule(interval, handler, configuration)
+ if interval > 0 then
+ -- Check whether the reserved boot configuration is fresh or stale.
+ -- If it is stale, refresh configuration
+ -- When a worker process is (re-)spawned,
+ -- it will start working with fresh (according the ttl semantics) configuration
+ local boot_reserved_hosts = configuration:find_by_host(boot_reserved_domain, false)
+ ngx.log(ngx.DEBUG, 'schedule new configuration loading')
+ local curr_interval = #boot_reserved_hosts == 0 and 0 or interval
+ schedule(curr_interval, handler, configuration)
else
ngx.log(ngx.DEBUG, 'no scheduling for configuration loading')
end
I have tested locally succesfully the following use cases:
Common initial setup for all use cases
make development
make dependencies
cat <<EOF >config.json
{
"services": [
{
"backend_version": "1",
"proxy": {
"hosts": [
"example.com"
],
"api_backend": "https://echo-api.3scale.net",
"backend": {
"endpoint": "http://127.0.0.1:8081",
"host": "backend"
},
"proxy_rules": [
{
"http_method": "GET",
"pattern": "/",
"metric_system_name": "hits",
"delta": 1,
"parameters": [],
"querystring_parameters": {}
}
],
"policy_chain": [
{
"name": "apicast.policy.apicast"
}
]
}
}
]
}
EOF
APICAST_CONFIGURATION_CACHE=-1
APICAST_LOG_LEVEL=debug APICAST_WORKERS=1 THREESCALE_CONFIG_FILE=config.json APICAST_CONFIGURATION_LOADER=boot APICAST_CONFIGURATION_CACHE=-1 ./bin/apicast
You should see the (debug) log line about no scheduling for configuration loading
:
2024/06/11 21:51:42 [debug] 249417#249417: *1 configuration_loader.lua:217: init_worker(): no scheduling for configuration loading
Even if you kill the worker process, the re-spawned worked process does not load config and uses the configuration at boot time
docker exec -ti -u root apicast_build_0-development-1 /bin/bash
bash-4.4# ls /proc | grep '[0-9]' | xargs -I {} sh -c '[ -a /proc/{}/cmdline ] && grep -q -P "nginx: worker process" /proc/{}/cmdline && echo killing pid {} && kill {}'
APICAst logs confirm the re-spawned worked process does not load config
2024/06/11 21:54:26 [notice] 249403#249403: signal 17 (SIGCHLD) received from 249417
2024/06/11 21:54:26 [notice] 249403#249403: worker process 249417 exited with code 0
2024/06/11 21:54:26 [notice] 249403#249403: start worker process 249558
2024/06/11 21:54:26 [notice] 249403#249403: signal 29 (SIGIO) received
2024/06/11 21:54:26 [debug] 249558#249558: *2 executor.lua:26: init_worker(): executor phase: init_worker
2024/06/11 21:54:26 [debug] 249558#249558: *2 policy_chain.lua:214: init_worker(): policy chain execute phase: init_worker, policy: Load Configuration, i: 1
2024/06/11 21:54:26 [debug] 249558#249558: *2 configuration_loader.lua:217: init_worker(): no scheduling for configuration loading
APICAST_CONFIGURATION_CACHE=30
APICAST_LOG_LEVEL=debug APICAST_WORKERS=1 THREESCALE_CONFIG_FILE=config.json APICAST_CONFIGURATION_LOADER=boot APICAST_CONFIGURATION_CACHE=30 ./bin/apicast
You should see the (debug) log line about schedule new configuration loading
:
2024/06/11 21:55:48 [debug] 249600#249600: *1 configuration_loader.lua:213: init_worker(): schedule new configuration loading
And every 30 secs,
2024/06/11 21:56:48 [info] 249600#249600: *3 configuration_loader.lua:194: auto updating configuration, context: ngx.timer
The functionality implemented in https://github.com/3scale/APIcast/pull/1399/files still applies:
Thanks for the eyes, @eguzki! I'm going to apply this patch locally and test for sanity as well, but then am happy to update this PR with the proposed change. I was actually curious if that entire conditional could be reworked, but didn't want to change too much without all the historical context. What you've explained makes total sense.
Thanks for your patience on this, @eguzki. When you're able, see the latest update based on your suggestions.
changes looks look to me.
Re-running the test as we have some (hateful) flaky tests
@tkan145 we might want to backport this fix to 3scale-2.15-stable
branch
Details of the issue as well as local reproducers are documented in https://github.com/3scale/APIcast/issues/1467.
This change ensures that when
boot.init_worker
checks to see whether or not the configuration needs to be reloaded, it also checks to confirm that theboot.ttl()
value set fromAPICAST_CONFIGURATION_CACHE
is a positive number, in order to avoid an immediate and infinite recursive loop of refreshing the configuration.This should still preserve the initial fix [1] where this regression appears to have been introduced, as we should never schedule reloading with the configuration settings of:
We should also never have a situation where we'd need to account for
interval = 0
since we fail out inboot.init
in that case [2].Tested locally as documented in the issue, however based on the comment in the initial fix PR, I've not added any tests in code. Happy to do so if there's now a good way to accomplish that!
[1] https://github.com/3scale/APIcast/pull/1399 [2] https://github.com/3scale/APIcast/blob/7e7eaf6f1d584c78d999c0d09f5b65203161c402/gateway/src/apicast/configuration_loader.lua#L146-L149