Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.
https://docs.konghq.com/kubernetes-ingress-controller/
Apache License 2.0
2.2k stars 590 forks source link

Changed behaviour between 0.1.3 and 0.5.0: Ingress resources with duplicated path are now sent into Kong configuration #399

Closed suankan closed 4 years ago

suankan commented 5 years ago

NOTE: GitHub issues are reserved for bug reports only. For anything else, please join the conversation in Kong Nation https://discuss.konghq.com/c/kubernetes.


Summary

SUMMARY_GOES_HERE

Kong Ingress controller version 0.1.3 0.5.0

Kong or Kong Enterprise version 0.34-1 0.36-1 0.36-2

Kubernetes version

paste kubectl version output

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T14:27:12Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.10", GitCommit:"37d169313237cb4ceb2cc4bef300f2ae3053c1a2", GitTreeState:"clean", BuildDate:"2019-08-19T10:44:49Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

Environment

Some Centos is used on host EC2 nodes.

What happened

K8s namespace, where my Kong Ingress Controller watches for Ingress resources, has few Ingresses with duplicate paths.

My current K8s deployments architecture:

The scenario of bringing installation to Kong EE 0.36-2 and KIC 0.5.0 is below.

  1. Ingress and KongIngress resources are deployed and not touched during the below steps.
  2. KIC is able to discover all API routes published via Ingress and KongIngress resources and Kong Proxy works fine.
  3. Started 50 TPS load against published APIs.
  4. Delete Cassandra DB deployment from K8s namespace completely including deleting of PersistentVolumeClaims.
  5. Delete KIC 0.1.3 deployment from K8s namespace completely.
  6. Kong Proxy component keeps working fine standalone and serving 50 TPS traffic to published APIs supposedly by currently applied configuration.
  7. Deploy fresh/empty Cassandra in K8s namespace.
  8. Executed Kong migrations from version 0.36-2 on top of fresh/empty Cassandra.
  9. Deploy KIC 0.5.0 along with Kong Admin 0.36-2 as a single POD. KIC talks to Kong Admin via http://localhost:8001.
  10. Update Kong Proxy deployment with new image 0.36-2.

The problem starts on step 10. KIC 0.5.0 is able to discover all Ingress and KongIngress resources. KIC 0.5.0 seems to be able to actually ask Kong Admin 0.36-2 to publish all found Ingresses. Because at this point I can query Kong Admin 0.36-2 for /routes, /services and /upstreams.

Then KIC 0.5.0 decides to make calls to Kong Admin like these:

GET /services?size=1000&tags=managed-by-ingress-controller GET /routes?size=1000&tags=managed-by-ingress-controller GET /plugins?size=1000&tags=managed-by-ingress-controller GET /upstreams?size=1000&tags=managed-by-ingress-controller and so on.

And Kong Admin for some reason returns {"message": "An unexpected error occurred"} for some of the above calls.

And finally, I am unable to call my APIs via Kong Proxy.

I am running the above redeployment/upgrade scenario steps 1 - 11 multiple times.

At different iterations of the above steps, on step 10 I'm seeing that Kong Admin returns {"message": "An unexpected error occurred"} for some of the above calls.

For example:

I run steps 1 - 11 first time. On step 10, I can see that Kong Admin returns {"message": "An unexpected error occurred"} when KIC calls 'GET /routes?size=1000&tags=managed-by-ingress-controller'.

I re-run steps 1 - 11 next time from scratch. And this time on step 10 I might see Kong Admin returns {"message": "An unexpected error occurred"} when KIC calls 'GET /upstreams?size=1000&tags=managed-by-ingress-controller'

And both times if I just manually call 'GET /routes' or 'GET /upstreams' Kong Admin responds successfully.

Another observation:

If before doing steps 1 - 11, I have only ONE Ingress resource published, then I do not observe the above problem.

If before doing steps 1 - 11, I have two Ingress resources with duplicate path, then I the above problem happens.

Another observation:

I re-deploy my entire Kong infrastructure with KIC 0.1.3 and Kong 0.34-1. I don't touch Ingress and KongIngress resources. Ingresses with duplicate paths exist.

Then I query Kong Admin endpoints /routes, /services/ and /upstreams to see what KIC was published via Kong Admin. And I see no duplicate paths in /routes JSON response. Please see attached file staging-routes_old_kong.json

Then I re-deploy Kong infrastructure with Kong 0.36-2 and KIC 0.5.0. And I query Kong Admin endpoints /routes again and see there are routes with duplicate paths in the JSON response. See the attached file staging-routes.json.

Important to note that I guarantee that both times I re-deployed my Kong infrastructure without touching Ingress and KongIngress CRDs.

So I guess that is pretty much changed behaviour in versions of KIC and Kong.

Expected behaviour

The situation in older Kong and KIC looks like a race condition. Whatever Ingress was found by KIC first - it sent it into Kong config. Older KIC seem was able to identify that next one is a duplicate and was not pushing it into Kong.

In the new KIC, it obviously sends ALL found Ingresses to Kong regardless if they have duplicate paths or not. And then Kong chokes on them...

The older behaviour would be more preferable. Because the problem is kept out of door. And Kong Proxy is able to serve whatever was pushed into Kong configuration.

Steps To Reproduce

See above.

hbagdi commented 5 years ago

Thank you for the detailed issue report.

At different iterations of the above steps, on step 10 I'm seeing that Kong Admin returns {"message": "An unexpected error occurred"} for some of the above calls.

Could you please share the error logs of Kong? There should be either a stack-trace or an error which relates to the above response.

In the new KIC, it obviously sends ALL found Ingresses to Kong regardless if they have duplicate paths or not. And then Kong chokes on them...

This is expected behavior. The Ingress rule parsing behavior has changed in 0.4.

If before doing steps 1 - 11, I have two Ingress resources with duplicate path, then I the above problem happens.

Can you share the two Ingress resources? (Please replace any sensitive detail.)

suankan commented 5 years ago

Hi @hbagdi,

Here is what I see in KIC 0.5.0 logs:

I0916 09:26:51.165355       1 status.go:199] new leader elected: kong-ingress-controller-68555df7c8-bhd2m
I0916 09:26:51.165361       1 status.go:184] I am the new status update leader
I0916 09:26:51.165393       1 queue.go:70] queuing item &Ingress{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[],},Spec:IngressSpec{Backend:nil,TLS:[],Rules:[],},Status:IngressStatus{LoadBalancer:k8s_io_api_core_v1.LoadBalancerStatus{Ingress:[],},},}
I0916 09:26:51.165480       1 queue.go:111] syncing 
I0916 09:26:51.165509       1 controller.go:118] syncing Ingress configuration...
I0916 09:26:51.165610       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/openbanking-identityprovider
I0916 09:26:51.165633       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/openbanking-identityprovider and port &ServicePort{Name:,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165648       1 parser.go:825] endpoints found: [{10.8.205.179 443} {10.8.205.27 443} {10.8.205.87 443}]
I0916 09:26:51.165669       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/template-edgeapi-dotnetcore2-2-v1
I0916 09:26:51.165676       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/template-edgeapi-dotnetcore2-2-v1 and port &ServicePort{Name:https-port,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165687       1 parser.go:825] endpoints found: [{10.8.229.164 443} {10.8.229.4 443} {10.8.229.72 443}]
I0916 09:26:51.165717       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/openbanking-datasharingapi-customers
I0916 09:26:51.165731       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/openbanking-datasharingapi-customers and port &ServicePort{Name:,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165741       1 parser.go:825] endpoints found: [{10.8.229.165 443} {10.8.229.21 443} {10.8.229.79 443}]
I0916 09:26:51.165753       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/openbanking-datasharingapi-products
I0916 09:26:51.165764       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/openbanking-datasharingapi-products and port &ServicePort{Name:https-port,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165775       1 parser.go:825] endpoints found: [{10.8.229.167 443} {10.8.229.23 443} {10.8.229.78 443}]
I0916 09:26:51.165789       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/entitlements-authorisation-data-api
I0916 09:26:51.165796       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/entitlements-authorisation-data-api and port &ServicePort{Name:https-port,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165805       1 parser.go:825] endpoints found: [{10.8.229.100 443} {10.8.229.166 443} {10.8.229.48 443}]
I0916 09:26:51.165817       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/openbanking-datasharingapi-accounts
I0916 09:26:51.165827       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/openbanking-datasharingapi-accounts and port &ServicePort{Name:https-port,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165836       1 parser.go:825] endpoints found: [{10.8.229.175 443} {10.8.229.24 443} {10.8.229.77 443}]
I0916 09:26:51.165850       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/openbanking-datasharingapi-transactions
I0916 09:26:51.165858       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/openbanking-datasharingapi-transactions and port &ServicePort{Name:,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165871       1 parser.go:825] endpoints found: [{10.8.229.117 443} {10.8.229.14 443} {10.8.229.151 443}]
I0916 09:26:51.165882       1 parser.go:627] obtaining port information for service openbanking-datasharing-test/entitlements-authorisation-decision-api
I0916 09:26:51.165889       1 parser.go:782] getting endpoints for service openbanking-datasharing-test/entitlements-authorisation-decision-api and port &ServicePort{Name:https-port,Protocol:TCP,Port:443,TargetPort:443,NodePort:0,}
I0916 09:26:51.165896       1 parser.go:825] endpoints found: [{10.8.229.132 443} {10.8.229.51 443} {10.8.229.86 443}]
E0916 09:26:54.044508       1 controller.go:125] unexpected failure updating Kong configuration: 
2 errors occured:
    while processing event: {Create} failed: 500 Internal Server Error {"message":"An unexpected error occurred"}
    while processing event: {Create} failed: 500 Internal Server Error {"message":"An unexpected error occurred"}
W0916 09:26:54.044549       1 queue.go:113] requeuing , err 2 errors occured:
    while processing event: {Create} failed: 500 Internal Server Error {"message":"An unexpected error occurred"}
    while processing event: {Create} failed: 500 Internal Server Error {"message":"An unexpected error occurred"}
I0916 09:26:54.049678       1 queue.go:111] syncing 
I0916 09:26:54.498948       1 controller.go:118] syncing Ingress configuration...

In Kong Admin I see these logs:

127.0.0.1 - - [16/Sep/2019:09:30:07 +0000] "GET /routes?size=1000&tags=managed-by-ingress-controller HTTP/1.1" 500 42 "-" "Go-http-client/1.1"
2019/09/16 09:30:07 [error] 32#0: *7377 lua coroutine: runtime error: ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:315: bad argument #1 to 'uuid()' (got nil)
stack traceback:
coroutine 0:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/cassandra/init.lua:657: in function 'uuid'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:315: in function 'serialize_arg'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:986: in function 'select'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:1205: in function 'dereference_rows'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:1369: in function 'page'
    /usr/local/share/lua/5.1/kong/db/dao/init.lua:1058: in function 'page_collection'
    /usr/local/share/lua/5.1/kong/api/endpoints.lua:303: in function 'fn'
    /usr/local/share/lua/5.1/kong/api/init.lua:50: in function </usr/local/share/lua/5.1/kong/api/init.lua:33>
coroutine 1:
    [C]: in function 'resume'
    /usr/local/share/lua/5.1/lapis/application.lua:393: in function 'handler'
    /usr/local/share/lua/5.1/lapis/application.lua:130: in function 'resolve'
    /usr/local/share/lua/5.1/lapis/application.lua:161: in function </usr/local/share/lua/5.1/lapis/application.lua:159>
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/lapis/application.lua:159: in function 'dispatch'
    /usr/local/share/lua/5.1/lapis/nginx.lua:215: in function 'serve_admin_api'
    content_by_lua(nginx-kong.conf:154):2: in function <content_by_lua(nginx-kong.conf:154):1>, client: 127.0.0.1, server: kong_admin, request: "GET /routes?size=1000&tags=managed-by-ingress-controller HTTP/1.1", host: "127.0.0.1:8001"
2019/09/16 09:30:07 [error] 32#0: *7377 [lua] init.lua:171: handle_error(): /usr/local/share/lua/5.1/lapis/application.lua:397: ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:315: bad argument #1 to 'uuid()' (got nil)
stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/cassandra/init.lua:657: in function 'uuid'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:315: in function 'serialize_arg'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:986: in function 'select'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:1205: in function 'dereference_rows'
    ...ocal/share/lua/5.1/kong/db/strategies/cassandra/init.lua:1369: in function 'page'
    /usr/local/share/lua/5.1/kong/db/dao/init.lua:1058: in function 'page_collection'
    /usr/local/share/lua/5.1/kong/api/endpoints.lua:303: in function 'fn'
    /usr/local/share/lua/5.1/kong/api/init.lua:50: in function </usr/local/share/lua/5.1/kong/api/init.lua:33>

stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/lapis/application.lua:397: in function 'handler'
    /usr/local/share/lua/5.1/lapis/application.lua:130: in function 'resolve'
    /usr/local/share/lua/5.1/lapis/application.lua:161: in function </usr/local/share/lua/5.1/lapis/application.lua:159>
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/lapis/application.lua:159: in function 'dispatch'
    /usr/local/share/lua/5.1/lapis/nginx.lua:215: in function 'serve_admin_api'
    content_by_lua(nginx-kong.conf:154):2: in function <content_by_lua(nginx-kong.conf:154):1>, client: 127.0.0.1, server: kong_admin, request: "GET /routes?size=1000&tags=managed-by-ingress-controller HTTP/1.1", host: "127.0.0.1:8001"

Ingress: http://pastebin.zone/phElSPdC KongIngress: http://pastebin.zone/qtDdDJVQ

The main problem is that Kong EE fails to serve the other API published, not just ones which are duplicated.

hbagdi commented 4 years ago

This is being tracked via the Kong Enterprise support portal. Closing this.