Closed jeremyjpj0916 closed 5 years ago
Update, follow up migration run yields this:
2018/10/29 04:40:36 [verbose] Kong: 1.0.0rc2
2018/10/29 04:40:36 [debug] ngx_lua: 10013
2018/10/29 04:40:36 [debug] nginx: 1013006
2018/10/29 04:40:36 [debug] Lua: LuaJIT 2.1.0-beta3
2018/10/29 04:40:36 [verbose] no config file found at /etc/kong/kong.conf
2018/10/29 04:40:36 [verbose] no config file found at /etc/kong.conf
2018/10/29 04:40:36 [verbose] no config file, skip loading
2018/10/29 04:40:36 [debug] reading environment variables
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_USERNAME ENV found with "dbUsername"
2018/10/29 04:40:36 [debug] KONG_PG_USER ENV found with ""
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_DATA_CENTERS ENV found with "DC1:3,DC2:3"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_PASSWORD ENV found with "******"
2018/10/29 04:40:36 [debug] KONG_PG_HOST ENV found with ""
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_SSL ENV found with "on"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_PORT ENV found with "9042"
2018/10/29 04:40:36 [debug] KONG_DATABASE ENV found with "cassandra"
2018/10/29 04:40:36 [debug] KONG_PG_DATABASE ENV found with ""
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_LOCAL_DATACENTER ENV found with "DC1"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_CONSISTENCY ENV found with "LOCAL_QUORUM"
2018/10/29 04:40:36 [debug] KONG_PG_SSL_VERIFY ENV found with ""
2018/10/29 04:40:36 [debug] KONG_PG_SSL ENV found with "off"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_REPL_STRATEGY ENV found with "NetworkTopologyStrategy"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_SCHEMA_CONSENSUS_TIMEOUT ENV found with "120000"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_TIMEOUT ENV found with "20000"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_SSL_VERIFY ENV found with "on"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_CONTACT_POINTS ENV found with "server00467"
2018/10/29 04:40:36 [debug] KONG_PG_PASSWORD ENV found with "******"
2018/10/29 04:40:36 [debug] KONG_LUA_SSL_VERIFY_DEPTH ENV found with "3"
2018/10/29 04:40:36 [debug] KONG_PG_PORT ENV found with ""
2018/10/29 04:40:36 [debug] KONG_LUA_SSL_TRUSTED_CERTIFICATE ENV found with "/usr/local/kong/ssl/kongcert.pem"
2018/10/29 04:40:36 [warn] You are using Cassandra but your 'db_update_propagation' setting is set to '0' (default). Due to the distributed nature of Cassandra, you should increase this value.
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_LB_POLICY ENV found with "RequestDCAwareRoundRobin"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_REPL_FACTOR ENV found with "3"
2018/10/29 04:40:36 [debug] KONG_CASSANDRA_KEYSPACE ENV found with "kong_dev"
2018/10/29 04:40:36 [debug] KONG_NGINX_DAEMON ENV found with "off"
2018/10/29 04:40:36 [debug] admin_access_log = "logs/admin_access.log"
2018/10/29 04:40:36 [debug] admin_error_log = "logs/error.log"
2018/10/29 04:40:36 [debug] admin_listen = {"127.0.0.1:8001","127.0.0.1:8444 ssl"}
2018/10/29 04:40:36 [debug] anonymous_reports = true
2018/10/29 04:40:36 [debug] cassandra_consistency = "LOCAL_QUORUM"
2018/10/29 04:40:36 [debug] cassandra_contact_points = {"server00467"}
2018/10/29 04:40:36 [debug] cassandra_data_centers = {"DC1:3","DC2:3"}
2018/10/29 04:40:36 [debug] cassandra_keyspace = "kong_dev"
2018/10/29 04:40:36 [debug] cassandra_lb_policy = "RequestDCAwareRoundRobin"
2018/10/29 04:40:36 [debug] cassandra_local_datacenter = "DC1"
2018/10/29 04:40:36 [debug] cassandra_password = "******"
2018/10/29 04:40:36 [debug] cassandra_port = 9042
2018/10/29 04:40:36 [debug] cassandra_repl_factor = 3
2018/10/29 04:40:36 [debug] cassandra_repl_strategy = "NetworkTopologyStrategy"
2018/10/29 04:40:36 [debug] cassandra_schema_consensus_timeout = 120000
2018/10/29 04:40:36 [debug] cassandra_ssl = true
2018/10/29 04:40:36 [debug] cassandra_ssl_verify = true
2018/10/29 04:40:36 [debug] cassandra_timeout = 20000
2018/10/29 04:40:36 [debug] cassandra_username = "dbUsername"
2018/10/29 04:40:36 [debug] client_body_buffer_size = "8k"
2018/10/29 04:40:36 [debug] client_max_body_size = "0"
2018/10/29 04:40:36 [debug] client_ssl = false
2018/10/29 04:40:36 [debug] custom_plugins = {}
2018/10/29 04:40:36 [debug] database = "cassandra"
2018/10/29 04:40:36 [debug] db_cache_ttl = 0
2018/10/29 04:40:36 [debug] db_resurrect_ttl = 30
2018/10/29 04:40:36 [debug] db_update_frequency = 5
2018/10/29 04:40:36 [debug] db_update_propagation = 0
2018/10/29 04:40:36 [debug] dns_error_ttl = 1
2018/10/29 04:40:36 [debug] dns_hostsfile = "/etc/hosts"
2018/10/29 04:40:36 [debug] dns_no_sync = false
2018/10/29 04:40:36 [debug] dns_not_found_ttl = 30
2018/10/29 04:40:36 [debug] dns_order = {"LAST","SRV","A","CNAME"}
2018/10/29 04:40:36 [debug] dns_resolver = {}
2018/10/29 04:40:36 [debug] dns_stale_ttl = 4
2018/10/29 04:40:36 [debug] error_default_type = "text/plain"
2018/10/29 04:40:36 [debug] headers = {"server_tokens","latency_tokens"}
2018/10/29 04:40:36 [debug] log_level = "notice"
2018/10/29 04:40:36 [debug] lua_package_cpath = ""
2018/10/29 04:40:36 [debug] lua_package_path = "./?.lua;./?/init.lua;"
2018/10/29 04:40:36 [debug] lua_socket_pool_size = 30
2018/10/29 04:40:36 [debug] lua_ssl_trusted_certificate = "/usr/local/kong/ssl/kongcert.pem"
2018/10/29 04:40:36 [debug] lua_ssl_verify_depth = 3
2018/10/29 04:40:36 [debug] mem_cache_size = "128m"
2018/10/29 04:40:36 [debug] nginx_admin_directives = {}
2018/10/29 04:40:36 [debug] nginx_daemon = "off"
2018/10/29 04:40:36 [debug] nginx_http_directives = {}
2018/10/29 04:40:36 [debug] nginx_optimizations = true
2018/10/29 04:40:36 [debug] nginx_proxy_directives = {}
2018/10/29 04:40:36 [debug] nginx_user = "nobody nobody"
2018/10/29 04:40:36 [debug] nginx_worker_processes = "auto"
2018/10/29 04:40:36 [debug] origins = {}
2018/10/29 04:40:36 [debug] pg_ssl = false
2018/10/29 04:40:36 [debug] pg_ssl_verify = false
2018/10/29 04:40:36 [debug] pg_timeout = 5000
2018/10/29 04:40:36 [debug] plugins = {"bundled"}
2018/10/29 04:40:36 [debug] prefix = "/usr/local/kong/"
2018/10/29 04:40:36 [debug] proxy_access_log = "logs/access.log"
2018/10/29 04:40:36 [debug] proxy_error_log = "logs/error.log"
2018/10/29 04:40:36 [debug] proxy_listen = {"0.0.0.0:8000","0.0.0.0:8443 ssl"}
2018/10/29 04:40:36 [debug] real_ip_header = "X-Real-IP"
2018/10/29 04:40:36 [debug] real_ip_recursive = "off"
2018/10/29 04:40:36 [debug] ssl_cipher_suite = "modern"
Error:
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:143: [Cassandra error] cluster_mutex callback threw an error: /usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:128: database has pending migrations; run 'kong migrations finish'
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:128: in function </usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:124>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/db/init.lua:377: in function 'cluster_mutex'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:124: in function 'up'
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:132: in function 'cmd_exec'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:87>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:44>
/usr/local/bin/kong:7: in function 'file_gen'
init_worker_by_lua:48: in function <init_worker_by_lua:46>
[C]: in function 'xpcall'
init_worker_by_lua:55: in function <init_worker_by_lua:53>
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:143: in function 'up'
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:132: in function 'cmd_exec'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:87>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:44>
/usr/local/bin/kong:7: in function 'file_gen'
init_worker_by_lua:48: in function <init_worker_by_lua:46>
[C]: in function 'xpcall'
init_worker_by_lua:55: in function <init_worker_by_lua:53>
2018/10/29 04:40:36 [debug] ssl_ciphers = "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"
2018/10/29 04:40:36 [debug] trusted_ips = {}
2018/10/29 04:40:36 [debug] upstream_keepalive = 60
2018/10/29 04:40:36 [verbose] prefix in use: /usr/local/kong
2018/10/29 04:40:37 [debug] loading subsystems migrations...
2018/10/29 04:40:37 [verbose] retrieving keyspace schema state...
2018/10/29 04:40:37 [verbose] schema state retrieved
2018/10/29 04:40:37 [debug] loading subsystems migrations...
2018/10/29 04:40:37 [verbose] retrieving keyspace schema state...
2018/10/29 04:40:38 [verbose] schema state retrieved
I am starting to get a clearer picture as to why there maybe the 3 "pending" executions which get wrapped up after the kong migrations finish
gets executed on the newer 1.0rc2 nodes as describe here: https://discuss.konghq.com/t/kong-1-0rc2-available-for-testing/2142 ? At least that would make sense so next steps from my perspective are to bring up both my nodes now on 1.0rc2 from kong 0.14.1 to latest (I did test 0.14.1 nodes existing proxies and such post initial migration to 1.0rc2 and they seem to be working currently which is good!)
Update: I cannot go any further with 1.0rc2 testing and finalizing my migrations, seems custom plugins cannot be run on 1.0rc2 with 0.14.1 compatible schemas:
Info: Using DT_HOME: /opt/dynatrace/oneagent
2018/10/29 04:53:15 [error] 21#0: init_by_lua error: /usr/local/share/lua/5.1/kong/init.lua:312: failed converting legacy schema for kong-splunk-log: unkown legacy field type: text
stack traceback:
[C]: in function 'assert'
/usr/local/share/lua/5.1/kong/init.lua:312: in function 'init'
init_by_lua:3: in main chunk
nginx: [error] init_by_lua error: /usr/local/share/lua/5.1/kong/init.lua:312: failed converting legacy schema for kong-splunk-log: unkown legacy field type: text
stack traceback:
[C]: in function 'assert'
/usr/local/share/lua/5.1/kong/init.lua:312: in function 'init'
init_by_lua:3: in main chunk
https://github.com/Optum/kong-splunk-log/blob/master/src/schema.lua#L8
@thibaultcha hope this gives some insight, seems a bit of the legacy schema is still not supported like fields declared as text vs I think in new format yall changed to "string". Let me know if you get a patch in and can drop it into docker-kong repo, then I can continue testing this all out 👍 .
I think I realized another portion of the process that was wrong in my mind for this upgrade (vs how we have always done it in the past).
Right now an env consists of 2 running Kong pods
Migrations for us have consisted of a temporary OpenShift "job" that just spins up a temp new version of kong, and runs kong migrations up to upgrade the db then kills off.
After running the migrations up job we bring up the existing 2 kong nodes on the newer version of Kong.
I think what deviates here for this upgrade is after running kong migrations up, that its necessary to run kong migrations finish from that same 1.0rc2 kong node(and ONLY then can we launch a 1.0rc2 node). So for now I will modify that job to specifically just run Kong migrations finish only to get around this right now in my dev environment.
For future kong migrations as we iterate versions would it make sense to just chain the upgrades to do both commands so I can keep the whole migrations process as a separate "job"? Would this be safe from iteration to iteration of Kong 1.0+ and beyond?
command: [ "/bin/sh", "-c", "kong migrations up --vv && kong migrations finish --vv" ]
Okay, after modifying my migrations job just to execute the kong migrations finish
we have new fresh errors:
2018/10/29 15:03:48 [verbose] Kong: 1.0.0rc2
2018/10/29 15:03:48 [debug] ngx_lua: 10013
2018/10/29 15:03:48 [debug] nginx: 1013006
2018/10/29 15:03:48 [debug] Lua: LuaJIT 2.1.0-beta3
2018/10/29 15:03:48 [verbose] no config file found at /etc/kong/kong.conf
2018/10/29 15:03:48 [verbose] no config file found at /etc/kong.conf
2018/10/29 15:03:48 [verbose] no config file, skip loading
2018/10/29 15:03:48 [debug] reading environment variables
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_USERNAME ENV found with "dbUsername"
2018/10/29 15:03:48 [debug] KONG_PG_USER ENV found with ""
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_DATA_CENTERS ENV found with "DC1:3,DC2:3"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_PASSWORD ENV found with "******"
2018/10/29 15:03:48 [debug] KONG_PG_HOST ENV found with ""
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_SSL ENV found with "on"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_PORT ENV found with "9042"
2018/10/29 15:03:48 [debug] KONG_DATABASE ENV found with "cassandra"
2018/10/29 15:03:48 [debug] KONG_PG_DATABASE ENV found with ""
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_LOCAL_DATACENTER ENV found with "DC1"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_CONSISTENCY ENV found with "LOCAL_QUORUM"
2018/10/29 15:03:48 [debug] KONG_PG_SSL_VERIFY ENV found with ""
2018/10/29 15:03:48 [debug] KONG_PG_SSL ENV found with "off"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_REPL_STRATEGY ENV found with "NetworkTopologyStrategy"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_SCHEMA_CONSENSUS_TIMEOUT ENV found with "120000"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_TIMEOUT ENV found with "20000"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_SSL_VERIFY ENV found with "on"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_CONTACT_POINTS ENV found with "server00467"
2018/10/29 15:03:48 [debug] KONG_PG_PASSWORD ENV found with "******"
2018/10/29 15:03:48 [debug] KONG_LUA_SSL_VERIFY_DEPTH ENV found with "3"
2018/10/29 15:03:48 [debug] KONG_PG_PORT ENV found with ""
2018/10/29 15:03:48 [debug] KONG_LUA_SSL_TRUSTED_CERTIFICATE ENV found with "/usr/local/kong/ssl/kongcert.pem"
2018/10/29 15:03:48 [warn] You are using Cassandra but your 'db_update_propagation' setting is set to '0' (default). Due to the distributed nature of Cassandra, you should increase this value.
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_LB_POLICY ENV found with "RequestDCAwareRoundRobin"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_REPL_FACTOR ENV found with "3"
2018/10/29 15:03:48 [debug] KONG_CASSANDRA_KEYSPACE ENV found with "kong_dev"
2018/10/29 15:03:48 [debug] KONG_NGINX_DAEMON ENV found with "off"
2018/10/29 15:03:48 [debug] admin_access_log = "logs/admin_access.log"
2018/10/29 15:03:48 [debug] admin_error_log = "logs/error.log"
2018/10/29 15:03:48 [debug] admin_listen = {"127.0.0.1:8001","127.0.0.1:8444 ssl"}
2018/10/29 15:03:48 [debug] anonymous_reports = true
2018/10/29 15:03:48 [debug] cassandra_consistency = "LOCAL_QUORUM"
2018/10/29 15:03:48 [debug] cassandra_contact_points = {"server00467"}
2018/10/29 15:03:48 [debug] cassandra_data_centers = {"DC1:3","DC2:3"}
2018/10/29 15:03:48 [debug] cassandra_keyspace = "kong_dev"
2018/10/29 15:03:48 [debug] cassandra_lb_policy = "RequestDCAwareRoundRobin"
2018/10/29 15:03:48 [debug] cassandra_local_datacenter = "DC1"
2018/10/29 15:03:48 [debug] cassandra_password = "******"
2018/10/29 15:03:48 [debug] cassandra_port = 9042
2018/10/29 15:03:48 [debug] cassandra_repl_factor = 3
2018/10/29 15:03:48 [debug] cassandra_repl_strategy = "NetworkTopologyStrategy"
2018/10/29 15:03:48 [debug] cassandra_schema_consensus_timeout = 120000
2018/10/29 15:03:48 [debug] cassandra_ssl = true
2018/10/29 15:03:48 [debug] cassandra_ssl_verify = true
2018/10/29 15:03:48 [debug] cassandra_timeout = 20000
2018/10/29 15:03:48 [debug] cassandra_username = "dbUsername"
2018/10/29 15:03:48 [debug] client_body_buffer_size = "8k"
2018/10/29 15:03:48 [debug] client_max_body_size = "0"
2018/10/29 15:03:48 [debug] client_ssl = false
2018/10/29 15:03:48 [debug] custom_plugins = {}
2018/10/29 15:03:48 [debug] database = "cassandra"
2018/10/29 15:03:48 [debug] db_cache_ttl = 0
2018/10/29 15:03:48 [debug] db_resurrect_ttl = 30
2018/10/29 15:03:48 [debug] db_update_frequency = 5
2018/10/29 15:03:48 [debug] db_update_propagation = 0
2018/10/29 15:03:48 [debug] dns_error_ttl = 1
2018/10/29 15:03:48 [debug] dns_hostsfile = "/etc/hosts"
2018/10/29 15:03:48 [debug] dns_no_sync = false
2018/10/29 15:03:48 [debug] dns_not_found_ttl = 30
2018/10/29 15:03:48 [debug] dns_order = {"LAST","SRV","A","CNAME"}
2018/10/29 15:03:48 [debug] dns_resolver = {}
2018/10/29 15:03:48 [debug] dns_stale_ttl = 4
2018/10/29 15:03:48 [debug] error_default_type = "text/plain"
2018/10/29 15:03:48 [debug] headers = {"server_tokens","latency_tokens"}
2018/10/29 15:03:48 [debug] log_level = "notice"
2018/10/29 15:03:48 [debug] lua_package_cpath = ""
2018/10/29 15:03:48 [debug] lua_package_path = "./?.lua;./?/init.lua;"
2018/10/29 15:03:48 [debug] lua_socket_pool_size = 30
2018/10/29 15:03:48 [debug] lua_ssl_trusted_certificate = "/usr/local/kong/ssl/kongcert.pem"
2018/10/29 15:03:48 [debug] lua_ssl_verify_depth = 3
2018/10/29 15:03:48 [debug] mem_cache_size = "128m"
2018/10/29 15:03:48 [debug] nginx_admin_directives = {}
2018/10/29 15:03:48 [debug] nginx_daemon = "off"
2018/10/29 15:03:48 [debug] nginx_http_directives = {}
2018/10/29 15:03:48 [debug] nginx_optimizations = true
2018/10/29 15:03:48 [debug] nginx_proxy_directives = {}
2018/10/29 15:03:48 [debug] nginx_user = "nobody nobody"
2018/10/29 15:03:48 [debug] nginx_worker_processes = "auto"
2018/10/29 15:03:48 [debug] origins = {}
2018/10/29 15:03:48 [debug] pg_ssl = false
2018/10/29 15:03:48 [debug] pg_ssl_verify = false
2018/10/29 15:03:48 [debug] pg_timeout = 5000
2018/10/29 15:03:48 [debug] plugins = {"bundled"}
2018/10/29 15:03:48 [debug] prefix = "/usr/local/kong/"
2018/10/29 15:03:48 [debug] proxy_access_log = "logs/access.log"
2018/10/29 15:03:48 [debug] proxy_error_log = "logs/error.log"
2018/10/29 15:03:48 [debug] proxy_listen = {"0.0.0.0:8000","0.0.0.0:8443 ssl"}
2018/10/29 15:03:48 [debug] real_ip_header = "X-Real-IP"
2018/10/29 15:03:48 [debug] real_ip_recursive = "off"
2018/10/29 15:03:48 [debug] ssl_cipher_suite = "modern"
2018/10/29 15:03:48 [debug] ssl_ciphers = "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"
2018/10/29 15:03:48 [debug] trusted_ips = {}
2018/10/29 15:03:48 [debug] upstream_keepalive = 60
2018/10/29 15:03:48 [verbose] prefix in use: /usr/local/kong
2018/10/29 15:03:48 [debug] loading subsystems migrations...
2018/10/29 15:03:48 [verbose] retrieving keyspace schema state...
2018/10/29 15:03:48 [verbose] schema state retrieved
2018/10/29 15:03:48 [debug] loading subsystems migrations...
2018/10/29 15:03:48 [verbose] retrieving keyspace schema state...
2018/10/29 15:03:49 [verbose] schema state retrieved
2018/10/29 15:03:49 [debug] pending migrations to finish:
core: 001_14_to_15
oauth2: 001_14_to_15
acl: 001_14_to_15
2018/10/29 15:03:49 [info] migrating core on keyspace 'kong_dev'...
2018/10/29 15:03:49 [debug] running migration: 001_14_to_15
2018/10/29 15:03:52 [info] core migrated up to: 001_14_to_15 (executed)
2018/10/29 15:03:52 [info] migrating oauth2 on keyspace 'kong_dev'...
2018/10/29 15:03:52 [debug] running migration: 001_14_to_15
Error:
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:181: [Cassandra error] cluster_mutex callback threw an error: /usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:176: [Cassandra error] failed to run migration '001_14_to_15' teardown: ...are/lua/5.1/kong/plugins/acl/migrations/001_14_to_15.lua:81: [Invalid] Invalid STRING constant (c8b871b7-7a18-45a9-9cd3-aa0a78c6073d) for "id" of type uuid
stack traceback:
[C]: in function 'assert'
...are/lua/5.1/kong/plugins/acl/migrations/001_14_to_15.lua:81: in function <...are/lua/5.1/kong/plugins/acl/migrations/001_14_to_15.lua:63>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/db/init.lua:791: in function 'run_migrations'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:176: in function </usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:165>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/db/init.lua:377: in function 'cluster_mutex'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:165: in function 'finish'
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:138: in function 'cmd_exec'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:87>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:44>
/usr/local/bin/kong:7: in function 'file_gen'
init_worker_by_lua:48: in function <init_worker_by_lua:46>
[C]: in function 'xpcall'
init_worker_by_lua:55: in function <init_worker_by_lua:53>
stack traceback:
[C]: in function 'assert'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:176: in function </usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:165>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/db/init.lua:377: in function 'cluster_mutex'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:165: in function 'finish'
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:138: in function 'cmd_exec'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:87>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:44>
/usr/local/bin/kong:7: in function 'file_gen'
init_worker_by_lua:48: in function <init_worker_by_lua:46>
[C]: in function 'xpcall'
init_worker_by_lua:55: in function <init_worker_by_lua:53>
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/kong/cmd/utils/migrations.lua:181: in function 'finish'
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:138: in function 'cmd_exec'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:87>
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:44>
/usr/local/bin/kong:7: in function 'file_gen'
init_worker_by_lua:48: in function <init_worker_by_lua:46>
[C]: in function 'xpcall'
init_worker_by_lua:55: in function <init_worker_by_lua:53>
2018/10/29 15:03:52 [info] oauth2 migrated up to: 001_14_to_15 (executed)
2018/10/29 15:03:52 [info] migrating acl on keyspace 'kong_dev'...
2018/10/29 15:03:52 [debug] running migration: 001_14_to_15
Seems ACL uuids not playing nice in the migrations process with code.
@jeremyjpj0916 Hi, thanks for the feedback.
The migrations are voluntarily split in two steps (pending/executed). Not all migrations have two steps, only ones that would be breaking for the underlying schema when 2 versions of Kong depend on it.
You should not run kong migrations finish
until all old nodes (0.14.1 in your case) are brought down. This is why we introduced 2 steps migrations, see the details posted in the announcement message which describes the process in details.
Pinging @hishamhm for the schema conversion issue (we are aware that the legacy schema support is incomplete, as pointed out on Kong Nation a few days ago), and I will submit a patch fixing two things:
Thanks again! Do you have any further questions? Are you considering running 1.0rc2 without the custom plugins that cannot run on it maybe? Or we could also give you pointers to update your custom plugins schema for the short term as well.
(The point of the 2 steps migrations process really is that in between the migrations up
and migrations finish
commands, the database is compatible with both 0.14 and 1.0.0rc2 at the same time)
Wanted to drop in what some of the data of acls table looks like per record in our dev db if that helps(our ACL groups are route id's for programmatic resolving of what Routes -> Services consumers have access to since we apply auth plugins on the route resources):
@ Row 10
-------------+--------------------------------------
id | bd632f35-b1d5-48d6-9b96-1f204b1a0db6
cache_key | null
consumer_id | 4a0a3c36-edf3-473c-bf01-cf9469f40746
created_at | 2018-08-09 20:08:57.829000+0000
group | 1a16155e-432d-4758-855b-c75dac4db3d4
@ Row 11
-------------+--------------------------------------
id | e8be8ed5-3962-4ba3-ab6f-2334ad46e4a2
cache_key | null
consumer_id | 8aa31296-a4ec-4cb4-bfc7-64a5428b3c65
created_at | 2018-08-22 14:41:44.564000+0000
group | e294c536-625e-454b-865d-b54889803d11
(11 rows)
Okay so the fact my custom plugin caused errors against 1.0rc2node prior to the migrations finish was not intended by Kong, glad to get that clarification as I was thinking maybe the custom plugin incompatibility was due to my lack of running the second migrations command. But at this stage I did run the migrations finish as a job which did primarily push us to 1.0rc2 compatibility minus the acl bug(with .14 nodes still up and running so they probably are whacky but hey its dev so not a big deal). So maybe what I do now is just roll back the db to a day ago and give this a go at a later day when the custom plugins with legacy schema can be supported on 1.0rc2 and the ACLs C* migration command fixed up.
Ideally I would like the 0.14.1 to 1.0 GA to go smooth with the minimal steps possible. I am hoping it could go something like:
Sounds this may not be set in stone since its not clear yet if 1.0 will still allow older plugin schema's and new format, or force a breaking change with all custom plugins by only allowing new format. Would prefer 1.0 having support for both so we can separate Kong upgrade from plugin refactors and not do everything all at once. Another option would be to run all environments and prod on 1.0rc2 if it will have support for old and new schema and then work on changing up the plugins before doing another upgrade from 1.0rc2 to 1.0(which at that point probably would not even require further db changes but just pulling down latest kong 1.0 from bintray). Would feel weird running a rc in prod though so prefer not to go that route.
Thanks again! Do you have any further questions? Are you considering running 1.0rc2 without the custom plugins that cannot run on it maybe? Or we could also give you pointers to update your custom plugins schema for the short term as well.
Not really looking to run Kong without the plugins as our plugins are what enable us to validate Kong is behaving as expected and provides all our important logging, even in dev we run all our plugins so it stays in line with other environments. I think I know what needs to change in our custom plugins when the time comes but going back to separation of tasks was ideal(upgrade kong | change 8 or so plugins). If not possible I may refactor how we pull from luarocks to actually use versions in our deploys because right now I just pull the latest which you can see in situations like this would not work. Say i update all our plugins now, well prod runs on that luarocks latest so the next build will cause broken prod 😄 , really my fault more than anything but its kind of nice to minimize the edits on our build scripts to not worry about versioning with plugins when pulling from luarocks(for plugins we manage, obviously an outsiders luarocks plugin is subject to change at any time).
@jeremyjpj0916 Regarding the plugin error at https://github.com/Optum/kong-splunk-log/blob/master/src/schema.lua#L8 the correct name for that type is "string" (even in 0.x schemas), but if that is working for you than it seems that "text" is being silently ignored by the 0.x schema validator? (It shouldn't be the case from my reading of kong/dao/schemas_validation.lua, though, so something's weird is at play.)
Changing that type to "string" in your plugin should make it work on both 0.x and 1.x. If 0.x is indeed treating unknown types as "string", I guess we could replicate that behavior in our auto-converter to improve backwards compatibility.
@hishamhm thats funny to learn its wrong for 0.x too 😆 , but yeah it does run and work fine on 0.14.x . Let me see if changing to string is okay for existing 0.14.1 version myself, if so then I have no problem making that change now(which might fix the schema error I mentioned earlier).
I also don't think you necessarily need to replicate that behavior but other dummies like myself may end up posting with similar errors if they leverage a plugin with a wrong schema type 😺 .
@thibaultcha if I were so daring to continue on 1.0rc2 with hishamhm's suggestion, is there a cqlsh query I could just run to get the acl table as needed(if there even are further changes) while you work on the migration query fix?
Will come back with the results of fixing my plugins schema tho(and reviewing any other plugins).
UPDATE:
Well fixing from text to string did get us launched for that custom plugin. New error though around an instance of the existing kong correlation id plugin enabled during runtime:
2018/10/29 23:08:15 [error] 37#0: *821 failed to run header_filter_by_lua*: ...al/share/lua/5.1/kong/plugins/correlation-id/handler.lua:84: invalid header value for "optum-cid-ext": got nil, expected string, number or boolean
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/kong/pdk/private/checks.lua:57: in function 'validate_header'
/usr/local/share/lua/5.1/kong/pdk/response.lua:303: in function 'set_header'
...al/share/lua/5.1/kong/plugins/correlation-id/handler.lua:84: in function 'header_filter'
/usr/local/share/lua/5.1/kong/init.lua:691: in function 'header_filter'
header_filter_by_lua:2: in function <header_filter_by_lua:1>
Where optum-cid-ext is the specific header we set the correlation id too.
DB Data
dbUsername@cqlsh:kong_dev> select * from plugins where name='correlation-id';
@ Row 1
-------------+-----------------------------------------------------------------------------------
id | b0b4a840-c6fe-4453-b9ec-e76de48658e6
api_id | null
cache_key | plugins:correlation-id::::
config | {"echo_downstream":true,"header_name":"optum-cid-ext","generator":"uuid#counter"}
consumer_id | null
created_at | 2018-07-01 00:08:23.917000+0000
enabled | True
name | correlation-id
route_id | null
service_id | null
(1 rows)
Decided to glance at your source too:
function CorrelationIdHandler:access(conf)
CorrelationIdHandler.super.access(self)
-- Set header for upstream
local header_value = kong.request.get_header(conf.header_name)
if not header_value then
-- Generate the header value
header_value = generators[conf.generator]()
kong.service.request.set_header(conf.header_name, header_value)
end
if conf.echo_downstream then
-- For later use, to echo it back downstream
kong.ctx.plugin.correlationid_header_value = header_value
end
end
function CorrelationIdHandler:header_filter(conf)
CorrelationIdHandler.super.header_filter(self)
if conf.echo_downstream then
kong.response.set_header(conf.header_name, kong.ctx.plugin.correlationid_header_value)
end
end
Somehow kong.ctx.plugin.correlationid_header_value
is nil in the header_filter phase for me. This logic above seems sound to me(set it in the access phase, come back and use it in the response header_filter phase if needed so interesting error we got going here.
Update with further thoughts I have had on possible culprits:
The specific route throwing the errors happens to be our healthcheck endpoint(yay our dev cluster is officially dead to our Loadbalancers 🤣) also has the request termination plugin enabled set to return Http Status 200 with a Kong message body of "Success", on 0.14.x correlation-id(which I have as a global plugin btw) does not seem to be executing on this route with request-termination(on 1.0 it certainly is, hence the nil pointer error logs and proxy call failing). I have also tested other proxy urls through the kong node terminal using localhost and they work same for admin api so it seems its JUST due to the request-termination plugin and this enabled on a route sharing the correlation-id plugin together that causes this break(probably correlation-id global or local to a route with request termination would reproduce this in yalls test env).
Edit again - YES, it was the fact I had request-termination enabled on this route. Disabling that plugin made the route work again(and subsequently I can see the correlation id in the headers on response. So Kong squad guru's something has changed in behavior for when correlation-id and request-termination share a route, and it produces a nil pointer on an expected header value for the correlation-id plugin during the response header_filter phase. I will leave yall to it, probably has to do with clashing priority or one plugin overwrites the local ctx of the other plugin somehow?? I am not sure exactly.
Edit - Seems the request-termination plugins config changed during migration or something... I honestly dunno what happened. I may try from scratch and see what happens again:
dbUsername@cqlsh:kong_dev> select * from plugins where name='request-termination';
@ Row 1
-------------+---------------------------------------------------------------------
id | ae61e25d-ae91-458e-b4c4-37913e32c6cf
api_id | null
cache_key | plugins:request-termination:375d9a66-73eb-417a-a85c-b20f918017b0:::
config | {"status_code":503,"body":null,"content_type":null,"message":null}
consumer_id | null
created_at | 2018-06-30 23:42:12.000000+0000
enabled | False
name | request-termination
route_id | 375d9a66-73eb-417a-a85c-b20f918017b0
service_id | null
Vs a 0.14.x Stage
dbUsername@cqlsh:kong_stage> select * from plugins where name='request-termination';
@ Row 1
-------------+-----------------------------------------
id | c4ae2364-09b8-4794-867c-a791b9c29630
name | request-termination
api_id | null
config | {"status_code":200,"message":"Success"}
consumer_id | null
created_at | 2018-04-04 05:48:30.993000+0000
enabled | True
route_id | 9e7434d4-00e7-4ecc-886b-39d2447d1d6f
service_id | null
How dev got to code 503 and all those nulls I am not sure... I also tried to patch the plugin back to a working state and that didn't seem to work either:
curl -X PATCH http://localhost:8001/plugins/ae61e25d-ae91-458e-b4c4-37913e32c6cf --data "enabled=true" --data "status_code=200" --data "created_at=1530402132" --data "message=Success"
Edit(I am dumb, fixed my curl and the patch worked):
curl -X PATCH http://localhost:8001/plugins/ae61e25d-ae91-458e-b4c4-37913e32c6cf --data "enabled=true" --data "config.status_code=200" --data "created_at=1530402132" --data "config.message=Success"
For now will write off the 503 and nulls as a failed patch attempt earlier at one point that messed it up and caused some 503 default. Probably not the migrations itself that did that
Last edit - Okay even with the termination plugin set to 200 Success again in dev the coorelation-id plugin is still throwing those nil pointer errors and the route health-check is failing... So def even if migrations did something weird to that request-termination plugin even in proper config state it and coorelation-id are clashing.
Last last edit - Also disabling the global correlation-id plugin while leaving request termination plugin got my healthcheck endpoint working again as well. So yeah def these plugins are not playing nice with eachother in 1.0.
TLDR - Wow I realize I wrote a book, the correlation-id plugin(enabled globally too, not that it probably matters) and request termination plugin sharing a route breaks the correlation-id plugin and subsequently the proxy route itself fails to respond successfully.
I thought about why they clash in 1.0 looking at the code at the moment(have not compared 0.14.1's logic though to see what changed and why this works in 0.14.1):
We see correlation-id executes with priority 1 with its access phase setting the header value. We see request-termination with priority 2 with its access phase setting headers of its own and doing a kong.response.exit()
Then somehow we still end up executing correlation-id header_filter phase, which we all know happens after a proxy call from backend (backend in our case being calling the localhost:8001/status admin endpoint. Request-termination by now has impacted the correlation-id's header plugin ctx value to nil during this phase). And the client receive an error and not what was set with the request-termination in the first place. Strange stuff.
A dirty fix for this in correlation-id for v1.0 would be:
if conf.echo_downstream and kong.ctx.plugin.correlationid_header_value then
kong.response.set_header(conf.header_name, kong.ctx.plugin.correlationid_header_value)
end
But does not solve the root cause of why the header_filter phase of correlation-id would execute at all since request-termination's access phase should have been the end of kong's plugin execution.
Other fixes(may be dirty as well) would be swapping the plugins priorities of 1 and 2. Do proxies that terminate with static content REALLY need a correlation id(which is generally used for tx level debugging that actually routes to backends?).
I personally was fine with it not returning the correlation id in the header upon response for this specific ping endpoint with the request termination, which is the behavior of 0.14.1.
Hi @jeremyjpj0916,
Some news:
schema.lua
should not break (assuming it is free of bugs, which is still not 100% proven given this has not been tested with a wide variety of schemas out there, such as yours)config
values) is worrying, please let us know if you happen to confirm this comes from the migrations themselves, or from another issue with your upgrade path. @kikito's next agenda item is to write our 0.14 -> 0.15 (aka 1.0) upgrade test suite, and ensuring all of our new migrations are non-breaking. We haven't had time to tackle it until now, sorry... Hopefully more to come on the stability of these migrations.why the header_filter phase of correlation-id would execute at all since request-termination's access phase should have been the end of kong's plugin execution.
It is actually intended behavior that the header_filter
phase gets executed (along with subsequent ones) , since otherwise, plugins lack the opportunity of acting upon some requests (or responses) produced by Kong itself, which leads to some blindness of the logging plugins, etc... You may remember #3533 which made this explicit, and clearly stated that the response phases should be executed for all responses (whether they are proxied from upstream, produced by Kong, or produced by Nginx). It is the plugin's responsibility to handle those cases, and it seems like the correlation-id
plugin does not handle it well. FWIW, plugins have been running their header_filter
phase on Kong-produced response for a very long time. The issue is that:
ngx.header[header_name] = nil
is valid ngx_lua, and ensures the header is removed from the response.kong.response.set_header()
does not accept nil
as an argument (the underlying error I believe occurs from this call to tostring()
: https://github.com/Kong/kong/blob/master/kong/pdk/private/checks.lua#L41).Since kong.response.clear_header()
exists to remove a response header, it'd be acceptable to reject kong.response.set_header(header_name, nil)
. In fact, that is even a tested behavior (see https://github.com/Kong/kong/blob/1.0.0rc2/t/01-pdk/08-response/05-set_header.t#L80). However, that also makes the API unnecessary restrictive (now branches would have to be written to unset a header if a variable is nil
or a valid header value (number
, string
...):
-- cumbersome
if value ~= nil then
kong.response.set_header("foo", value)
else
kong.response.clear_header("foo")
end
-- leaner
kong.response.set_header("foo", value)
That said, the correlation-id
should not unset the header if it isn't setting it, which is currently the behavior it follows.
In short, we need a few action items:
correlation-id
should implement your patch (or rely on kong.response.get_source to determine (or "assume") whether its access
phase was run or not. Your contribution would be welcome here @jeremyjpj0916!nil
arguments to set_header()
utilities, although this has a lower priority.As always, thank you for all the details and collaboration!
@thibaultcha and thank you for digging deep and taking time to investigate and address accordingly, the analysis on correlation-id and request termination plugin clash seems spot on now to me. All this effort from yourself and team + community testing will yield a great 1.0 release. Hopefully most of the plugin conversions to the pdk will not yield clashing behaviors, and what I stumbled into was more of a strange situational case. Exhaustive combinations of Kong plugins on top of services and routes would be pretty tiring to make, but maybe someone internally to Kong could make it as its own big test suite 😄(at the end of the day it would probably catch a lot of any future pdk changes that could break Kong core plugins in ways like I saw).
My takeaways for next steps will be this:
I already have our dev env reverted back to 0.14.1 again, I am going to test the migrations using 1.0rc2 with acl bug still in place, but ensure the request-transformation plugin table migrates as expected to wrap that up for yah and leave no doubt about any weirdness there.
Some sort of correlation-id patch on next. But would a pdk adjustment to allow nil arguments to set header might be cleaner and require no further code change to the plugin itself? And what underlying action should occur on setting a header to nil be in pdk? Just ngx.header[header_name] = nil
under the hood would be my first thought.
Look forward to testing 1.0rc3 for the acl fix and try full end to end migration flow again. Maybe we will have squeezed in the plugin clash fix error described above too, and I will be able to validate other plugins for potential problems too.
@jeremyjpj0916 Oh, an item I left out from my previous comment on updates to this thread:
up
and finish
) for Cassandra; @bungle is on it :)Maybe this logic for the pdk checks.lua to allow/account for value nil in header set?
function checks.normalize_header(value)
local tvalue = type(value)
if tvalue == nil then
return nil
end
if tvalue == "string" then
return value == "" and " " or value
end
-- header is number or boolean
return tostring(value)
end
function checks.validate_header(name, value)
local tname = type(name)
if tname ~= "string" then
error(fmt("invalid header name %q: got %s, " ..
"expected string", name, tname), 3)
end
local tvalue = type(value)
if tvalue ~= "string" then
if tvalue == "number" or tvalue == "boolean" then
value = tostring(value)
elseif tvalue == nil then
--value already nil and valid, do nothing
else
error(fmt("invalid header value for %q: got %s, expected " ..
"nil, string, number or boolean", name, tvalue), 3)
end
end
return value
end
I agree that the current set_header
behavior is more cumbersome than the leaner "nil to unset", but I would argue that it is less error-prone:
That said, the correlation-id should not unset the header if it isn't setting it, which is currently the behavior it follows.
When a plugin author forgets to test the nil
case (and it happened to be nil just because there's no new header value to set), the lean version will happily unset it silently. The current cumbersome version will complain and make sure the author specified their intent in no uncertain terms.
As with everything, it's a balance between safety and convenience.
I'd prefer if we had an explicit way to state that the desired behavior is "set-or-clear". So, we'd have either a very explicit set_or_clear_header
function, or mark set_header
with an optional "mode" argument. I'm not a fan of magic booleans in function arguments as these are hard to read and tend to accumulate, so perhaps a string one, taking a page from other Lua APIs?
-- not as lean, but safer
kong.response.set_header("foo", value, "*clear")
(I considered suggesting kong.response.set_header("foo", value or kong.null)
to mean set-or-clear, but that would break when value
is false
)
Sounds like there may be time for internal discussion around how to best lay it out in the pdk for Kong. I prefer simplicity for an end user so if I want to set a header to nil I should be allowed to do so calling something that is as straight forward and easy+human readable as : ngx.header[header_name] = nil
For now I don't see too much harm in having the correlation-id plugin have this logic:
if conf.echo_downstream and kong.ctx.plugin.correlationid_header_value then
kong.response.set_header(conf.header_name, kong.ctx.plugin.correlationid_header_value)
end
So if there are not any objections I will PR this later so the 2 plugins don't break me on 1.0rc3 for now testing around. But that PR will come after the http-log PR gets wrapped up because I am a scrub and have not figured out how to break out commits to the same branch into separate PRs(if thats even possible 😆). Gave that a go earlier today at one point with my pdk change idea and realized it leaked right into my other http-log PR lol...
The fix to the correlation-id plugin is required regardless of the direction followed by the PDK anyways (unless if passing nil
to set_header
is updated to NOP), so yes, please go ahead with the fix! :)
@thibaultcha comments shows the problem with nil
semantics there. It is not clear if set_header("x", nil)
will just return early or if it should actually remove the existing header. I guess whatever we decide, there can be confusion, and we get "i thought it would remove the header" or "i thought it will skip setting anything because there is nothing to set". The current design is immune to this.
Used my colleagues account to PR the fix for correlation-id when request-termination is present on service/route.
Btw. not allowing nil on set header has introduced a new bug: https://github.com/Kong/kong/issues/3964
Hmm, guess we will find out just how many plugins were impacted by that decision as we progress here and get into each ones nitty gritty different config options.
@jeremyjpj0916 Last Wednesday, we planned some work to audit plugins and ensure none is vulnerable to this bug. @bungle is not it but hasn't been through all of them yet :)
@thibaultcha , did test your 1.0rc3 latest version candidate, the initial migrations command seemed to work but I can' bring up a node currently.
Result of initial migrations:
2018/12/10 23:08:09 [verbose] Kong: 1.0.0rc3
2018/12/10 23:08:09 [debug] ngx_lua: 10013
2018/12/10 23:08:09 [debug] nginx: 1013006
2018/12/10 23:08:09 [debug] Lua: LuaJIT 2.1.0-beta3
2018/12/10 23:08:09 [verbose] no config file found at /etc/kong/kong.conf
2018/12/10 23:08:09 [verbose] no config file found at /etc/kong.conf
2018/12/10 23:08:09 [verbose] no config file, skip loading
2018/12/10 23:08:09 [debug] reading environment variables
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_USERNAME ENV found with "dbusername"
2018/12/10 23:08:09 [debug] KONG_PG_USER ENV found with ""
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_DATA_CENTERS ENV found with "DC2:3,DC1:3"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_PASSWORD ENV found with "******"
2018/12/10 23:08:09 [debug] KONG_PG_HOST ENV found with ""
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_SSL ENV found with "on"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_PORT ENV found with "9042"
2018/12/10 23:08:09 [debug] KONG_DATABASE ENV found with "cassandra"
2018/12/10 23:08:09 [debug] KONG_PG_DATABASE ENV found with ""
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_LOCAL_DATACENTER ENV found with "DC1"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_CONSISTENCY ENV found with "LOCAL_QUORUM"
2018/12/10 23:08:09 [debug] KONG_PG_SSL_VERIFY ENV found with ""
2018/12/10 23:08:09 [debug] KONG_PG_SSL ENV found with "off"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_REPL_STRATEGY ENV found with "NetworkTopologyStrategy"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_SCHEMA_CONSENSUS_TIMEOUT ENV found with "120000"
2018/12/10 23:08:09 [debug] KONG_PG_PASSWORD ENV found with "******"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_TIMEOUT ENV found with "20000"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_SSL_VERIFY ENV found with "on"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_CONTACT_POINTS ENV found with "server00466"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_REPL_FACTOR ENV found with "3"
2018/12/10 23:08:09 [debug] KONG_PG_PORT ENV found with ""
2018/12/10 23:08:09 [debug] KONG_LUA_SSL_VERIFY_DEPTH ENV found with "3"
2018/12/10 23:08:09 [debug] KONG_LUA_SSL_TRUSTED_CERTIFICATE ENV found with "/usr/local/kong/ssl/kongcert.pem"
2018/12/10 23:08:09 [warn] You are using Cassandra but your 'db_update_propagation' setting is set to '0' (default). Due to the distributed nature of Cassandra, you should increase this value.
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_LB_POLICY ENV found with "RequestDCAwareRoundRobin"
2018/12/10 23:08:09 [debug] KONG_CASSANDRA_KEYSPACE ENV found with "kong_dev"
2018/12/10 23:08:09 [debug] KONG_NGINX_DAEMON ENV found with "off"
2018/12/10 23:08:09 [debug] admin_access_log = "logs/admin_access.log"
2018/12/10 23:08:09 [debug] admin_error_log = "logs/error.log"
2018/12/10 23:08:09 [debug] admin_listen = {"127.0.0.1:8001","127.0.0.1:8444 ssl"}
2018/12/10 23:08:09 [debug] anonymous_reports = true
2018/12/10 23:08:09 [debug] cassandra_consistency = "LOCAL_QUORUM"
2018/12/10 23:08:09 [debug] cassandra_contact_points = {"server00466"}
2018/12/10 23:08:09 [debug] cassandra_data_centers = {"DC2:3","DC1:3"}
2018/12/10 23:08:09 [debug] cassandra_keyspace = "kong_dev"
2018/12/10 23:08:09 [debug] cassandra_lb_policy = "RequestDCAwareRoundRobin"
2018/12/10 23:08:09 [debug] cassandra_local_datacenter = "DC1"
2018/12/10 23:08:09 [debug] cassandra_password = "******"
2018/12/10 23:08:09 [debug] cassandra_port = 9042
2018/12/10 23:08:09 [debug] cassandra_repl_factor = 3
2018/12/10 23:08:09 [debug] cassandra_repl_strategy = "NetworkTopologyStrategy"
2018/12/10 23:08:09 [debug] cassandra_schema_consensus_timeout = 120000
2018/12/10 23:08:09 [debug] cassandra_ssl = true
2018/12/10 23:08:09 [debug] cassandra_ssl_verify = true
2018/12/10 23:08:09 [debug] cassandra_timeout = 20000
2018/12/10 23:08:09 [debug] cassandra_username = "dbusername"
2018/12/10 23:08:09 [debug] client_body_buffer_size = "8k"
2018/12/10 23:08:09 [debug] client_max_body_size = "0"
2018/12/10 23:08:09 [debug] client_ssl = false
2018/12/10 23:08:09 [debug] custom_plugins = {}
2018/12/10 23:08:09 [debug] database = "cassandra"
2018/12/10 23:08:09 [debug] db_cache_ttl = 0
2018/12/10 23:08:09 [debug] db_resurrect_ttl = 30
2018/12/10 23:08:09 [debug] db_update_frequency = 5
2018/12/10 23:08:09 [debug] db_update_propagation = 0
2018/12/10 23:08:09 [debug] dns_error_ttl = 1
2018/12/10 23:08:09 [debug] dns_hostsfile = "/etc/hosts"
2018/12/10 23:08:09 [debug] dns_no_sync = false
2018/12/10 23:08:09 [debug] dns_not_found_ttl = 30
2018/12/10 23:08:09 [debug] dns_order = {"LAST","SRV","A","CNAME"}
2018/12/10 23:08:09 [debug] dns_resolver = {}
2018/12/10 23:08:09 [debug] dns_stale_ttl = 4
2018/12/10 23:08:09 [debug] error_default_type = "text/plain"
2018/12/10 23:08:09 [debug] headers = {"server_tokens","latency_tokens"}
2018/12/10 23:08:09 [debug] log_level = "notice"
2018/12/10 23:08:09 [debug] lua_package_cpath = ""
2018/12/10 23:08:09 [debug] lua_package_path = "./?.lua;./?/init.lua;"
2018/12/10 23:08:09 [debug] lua_socket_pool_size = 30
2018/12/10 23:08:09 [debug] lua_ssl_trusted_certificate = "/usr/local/kong/ssl/kongcert.pem"
2018/12/10 23:08:09 [debug] lua_ssl_verify_depth = 3
2018/12/10 23:08:09 [debug] mem_cache_size = "128m"
2018/12/10 23:08:09 [debug] nginx_admin_directives = {}
2018/12/10 23:08:09 [debug] nginx_daemon = "off"
2018/12/10 23:08:09 [debug] nginx_http_directives = {}
2018/12/10 23:08:09 [debug] nginx_optimizations = true
2018/12/10 23:08:09 [debug] nginx_proxy_directives = {}
2018/12/10 23:08:09 [debug] nginx_user = "nobody nobody"
2018/12/10 23:08:09 [debug] nginx_worker_processes = "auto"
2018/12/10 23:08:09 [debug] origins = {}
2018/12/10 23:08:09 [debug] pg_ssl = false
2018/12/10 23:08:09 [debug] pg_ssl_verify = false
2018/12/10 23:08:09 [debug] pg_timeout = 5000
2018/12/10 23:08:09 [debug] plugins = {"bundled"}
2018/12/10 23:08:09 [debug] prefix = "/usr/local/kong/"
2018/12/10 23:08:09 [debug] proxy_access_log = "logs/access.log"
2018/12/10 23:08:09 [debug] proxy_error_log = "logs/error.log"
2018/12/10 23:08:09 [debug] proxy_listen = {"0.0.0.0:8000","0.0.0.0:8443 ssl"}
2018/12/10 23:08:09 [debug] real_ip_header = "X-Real-IP"
2018/12/10 23:08:09 [debug] real_ip_recursive = "off"
2018/12/10 23:08:09 [debug] ssl_cipher_suite = "modern"
2018/12/10 23:08:09 [debug] ssl_ciphers = "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256"
2018/12/10 23:08:09 [debug] stream_listen = {"off"}
2018/12/10 23:08:09 [warn] 9#0: *2 [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address, context: ngx.timer
2018/12/10 23:08:09 [debug] trusted_ips = {}
2018/12/10 23:08:09 [debug] upstream_keepalive = 60
2018/12/10 23:08:09 [verbose] prefix in use: /usr/local/kong
2018/12/10 23:08:09 [debug] loading subsystems migrations...
2018/12/10 23:08:09 [verbose] retrieving keyspace schema state...
2018/12/10 23:08:09 [verbose] schema state retrieved
2018/12/10 23:08:09 [info] upgrading from 0.14, bootstrapping database...
2018/12/10 23:08:09 [debug] creating 'kong_dev' keyspace if not existing...
2018/12/10 23:08:09 [debug] successfully created 'kong_dev' keyspace
2018/12/10 23:08:10 [debug] creating 'schema_meta' table if not existing...
2018/12/10 23:08:10 [debug] successfully created 'schema_meta' table
2018/12/10 23:08:10 [debug] creating 'locks' table if not existing...
2018/12/10 23:08:10 [debug] successfully created 'locks' table
2018/12/10 23:08:10 [verbose] waiting for Cassandra schema consensus (60000ms timeout)...
2018/12/10 23:08:11 [verbose] Cassandra schema consensus: reached
2018/12/10 23:08:11 [debug] loading subsystems migrations...
2018/12/10 23:08:11 [verbose] retrieving keyspace schema state...
2018/12/10 23:08:12 [verbose] schema state retrieved
2018/12/10 23:08:12 [debug] migrations to run:
core: 000_base, 001_14_to_15
oauth2: 000_base_oauth2, 001_14_to_15
acl: 000_base_acl, 001_14_to_15
jwt: 000_base_jwt, 001_14_to_15
basic-auth: 000_base_basic_auth, 001_14_to_15
key-auth: 000_base_key_auth, 001_14_to_15
rate-limiting: 000_base_rate_limiting, 001_14_to_15
hmac-auth: 000_base_hmac_auth, 001_14_to_15
response-ratelimiting: 000_base_response_rate_limiting, 001_14_to_15
2018/12/10 23:08:12 [info] migrating core on keyspace 'kong_dev'...
2018/12/10 23:08:12 [debug] running migration: 000_base
2018/12/10 23:08:12 [info] core migrated up to: 000_base (executed)
2018/12/10 23:08:12 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] core migrated up to: 001_14_to_15 (pending)
2018/12/10 23:08:13 [info] migrating oauth2 on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_oauth2
2018/12/10 23:08:13 [info] oauth2 migrated up to: 000_base_oauth2 (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] oauth2 migrated up to: 001_14_to_15 (pending)
2018/12/10 23:08:13 [info] migrating acl on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_acl
2018/12/10 23:08:13 [info] acl migrated up to: 000_base_acl (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] acl migrated up to: 001_14_to_15 (pending)
2018/12/10 23:08:13 [info] migrating jwt on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_jwt
2018/12/10 23:08:13 [info] jwt migrated up to: 000_base_jwt (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] jwt migrated up to: 001_14_to_15 (executed)
2018/12/10 23:08:13 [info] migrating basic-auth on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_basic_auth
2018/12/10 23:08:13 [info] basic-auth migrated up to: 000_base_basic_auth (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] basic-auth migrated up to: 001_14_to_15 (executed)
2018/12/10 23:08:13 [info] migrating key-auth on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_key_auth
2018/12/10 23:08:13 [info] key-auth migrated up to: 000_base_key_auth (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] key-auth migrated up to: 001_14_to_15 (executed)
2018/12/10 23:08:13 [info] migrating rate-limiting on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_rate_limiting
2018/12/10 23:08:13 [info] rate-limiting migrated up to: 000_base_rate_limiting (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] rate-limiting migrated up to: 001_14_to_15 (executed)
2018/12/10 23:08:13 [info] migrating hmac-auth on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_hmac_auth
2018/12/10 23:08:13 [info] hmac-auth migrated up to: 000_base_hmac_auth (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] hmac-auth migrated up to: 001_14_to_15 (executed)
2018/12/10 23:08:13 [info] migrating response-ratelimiting on keyspace 'kong_dev'...
2018/12/10 23:08:13 [debug] running migration: 000_base_response_rate_limiting
2018/12/10 23:08:13 [info] response-ratelimiting migrated up to: 000_base_response_rate_limiting (executed)
2018/12/10 23:08:13 [debug] running migration: 001_14_to_15
2018/12/10 23:08:13 [info] response-ratelimiting migrated up to: 001_14_to_15 (executed)
2018/12/10 23:08:13 [info] 18 migrations processed
2018/12/10 23:08:13 [info] 15 executed
2018/12/10 23:08:13 [info] 3 pending
2018/12/10 23:08:13 [verbose] waiting for Cassandra schema consensus (60000ms timeout)...
2018/12/10 23:08:15 [verbose] Cassandra schema consensus: reached
Trying to Start up a 1.0rc3 version
2018/12/10 23:21:37 [warn] 22#0: [lua] getssl.lua:10: OpenResty patch missing. See https://github.com/Kong/openresty-patches
nginx: [warn] [lua] getssl.lua:10: OpenResty patch missing. See https://github.com/Kong/openresty-patches
2018/12/10 23:21:37 [warn] 22#0: [lua] init.lua:260: init(): missing "ngx.balancer".set_ssl_ctx API. Dynamic client SSL_CTX* will be unavailable
nginx: [warn] [lua] init.lua:260: init(): missing "ngx.balancer".set_ssl_ctx API. Dynamic client SSL_CTX* will be unavailable
2018/12/10 23:21:37 [warn] 22#0: [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address
nginx: [warn] [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address
2018/12/10 23:21:38 [warn] 22#0: [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address
nginx: [warn] [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address
2018/12/10 23:21:38 [error] 22#0: init_by_lua error: /usr/local/share/lua/5.1/kong/init.lua:292: [cassandra] could not execute page query: [Invalid] Undefined column name run_on
stack traceback:
[C]: in function 'assert'
/usr/local/share/lua/5.1/kong/init.lua:292: in function 'init'
init_by_lua:3: in main chunk
nginx: [error] init_by_lua error: /usr/local/share/lua/5.1/kong/init.lua:292: [cassandra] could not execute page query: [Invalid] Undefined column name run_on
stack traceback:
[C]: in function 'assert'
/usr/local/share/lua/5.1/kong/init.lua:292: in function 'init'
init_by_lua:3: in main chunk
I think @james-callahan did do a PR for this fix but for now can't progress with testing on 1.0rc3, but did confirm the ACL and migration consensus worked. Looks like I will roll back again to 0.14.1 until rc4 is here and should have the fix.
cc'ed @p0pr0ck5 : Added you here since I know I mentioned I would have the results tonight from the other 404 issue, looks like it won't be so until later.
@jeremyjpj0916 Yes, several reports of this issue came up. We are introducing a way to upgrade between release candidates in rc4 (see #4071 and the --force
flag).
As of today, resetting the database and doing a fresh bootstrap required for upgrading between release candidates. Or using a fresh keyspace.
Okay makes sense, closing this as I think we got the things this issue brings up fixed. Will reopen or make new issues if needed around testing the 1.0rc4. I would prefer testing 1.0rc4 from a 0.14.1 node so I will be rolling back my dev environment completely rather than smaller rc3 to rc4 upgrades.
Figured Kong may want some external feedback on what migrations looks like from an outside user. So here is my initial thought based on what I see:
Summary
Generally I expect conclusive results at the end of a migrations informing me everything completed successfully or not. My first run of the migrations from a Cassandra database running the 0.14.1 Kong using Cassandra 3.11.x seems to have some mixed bag of results. Log detailed below
Prior migrations have usually finalized themselves with a comment of
With no further details. This migrations ran through till OpenShift reports "completed" so the
kong migrations up
command returned success on execution, but the logs leave me scratching my head as it says 3 things are pending as the last known info?I am about to attempt a rerun of the migrations one more time to see if it reveals any extra detail or says everything looks good (and will return to post back 2nd migrations execution results). Regardless this may be a point in the migration process Kong could clean up so its clear to the end user the result.