canonical / istio-operators

Charmed Istio
2 stars 17 forks source link

Add test that removes and redeploys istio, fix juju automated cleanup #38

Open ca-scribner opened 2 years ago

ca-scribner commented 2 years ago

Presently, removing and redeploying istio can lead to a broken cluster (try deploying istio, removing, then deploying other charms. They may (will?) get stuck "installing agent"). This is broken at least in part by this juju bug which prevents remove hooks from working properly.

Even without the remove hook, Juju should(?) also be tracking objects that are created by the charm and destroying them we destroy an application I think? This might be related to the conversation on how juju may be bugged with tracking objects that are created.

DnPlas commented 2 years ago

I believe this issue is not present anymore as we are now handling and testing the remove routine. Also, the juju bug you have mentioned has been fixed. @ca-scribner have you seen this behaviour in the most recent versions of istio?

ca-scribner commented 1 year ago

This is still observed. our istio-pilot creates some cluster-global objects that Juju does not track. To fix this, you can either:

rgildein commented 2 months ago

I was hitting same issue during testing another charm, which require isto-pilot as requirement. How I fixed it:

juju exec --unit istio-pilot/0 -- ./istioctl uninstall -y --purge
juju destroy-model --destroy-storage --force kubeflow
# running integration tests again
tox -e integration -- --model kubeflow --keep-models

However, I believe that istio-pilot should run uninstall command during remove hook to properly clean up after itself.

syncronize-issues-to-jira[bot] commented 2 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5870.

This message was autogenerated

rgildein commented 2 months ago

After some research (the remove hook is cleaning istio) and discussion in the team, I believe that this error can be solved by checking cluster in the install hook. Such a check should be able to tell that istio can be installed on the cluster and if not, put the charm in a blocked state with the proper message to the user. Because seeing charm in error state and logs (e.g. below) is not really helpful.

Sharing more logs:

unit-istio-pilot-0: 12:18:53 INFO juju.worker.uniter awaiting error resolution for "start" hook                                                                                                                                                                                           
unit-istio-pilot-0: 12:18:54 INFO unit.istio-pilot/0.juju-log Running legacy hooks/start.                                                                                                                                                                                                 
unit-istio-pilot-0: 12:18:54 INFO unit.istio-pilot/0.juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/istio-ingressgateway-workload "HTTP/1.1 200 OK"                                                                                                  
unit-istio-pilot-0: 12:18:54 ERROR unit.istio-pilot/0.juju-log Uncaught exception while in charm code:                                                                                                                                                                                    
Traceback (most recent call last):                                                                                                                                                                                                                                                        
  File "./src/charm.py", line 1224, in <module>                                                                                              
    main(Operator)                                                                                                                           
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 429, in main                                                   
    charm = charm_class(framework)                                                                                                           
  File "./src/charm.py", line 116, in __init__                                                                                                                                                                                                                                            
    cert_subject=self._cert_subject,                                                                                                         
  File "./src/charm.py", line 513, in _cert_subject                                                                                                                                                                                                                                       
    svc_address = _get_gateway_address_from_svc(svc)                                                                                                                                                                                                                                      
  File "./src/charm.py", line 1078, in _get_gateway_address_from_svc                                                                         
    gateway_address = _get_address_from_loadbalancer(svc)                                                                                                                                                                                                                                 
  File "./src/charm.py", line 1093, in _get_address_from_loadbalancer                                                                                                                                                                                                                     
    if len(ingresses) != 1:                                                                                                                                                                                                                                                               
TypeError: object of type 'NoneType' has no len()                                                                                                                                                                                                                                         
unit-istio-pilot-0: 12:18:55 ERROR juju.worker.uniter.operation hook "start" (via hook dispatching script: dispatch) failed: exit status 1                                                                                                                                                
unit-istio-pilot-0: 12:18:55 INFO juju.worker.uniter awaiting error resolution for "start" hook                                                                                                                                                                                           
unit-istio-pilot-0: 12:19:00 INFO juju.worker.uniter awaiting error resolution for "start" hook                                                                                                                                                                                           
unit-istio-pilot-0: 12:19:00 INFO unit.istio-pilot/0.juju-log Running legacy hooks/start.                                                    
unit-istio-pilot-0: 12:19:00 INFO unit.istio-pilot/0.juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/istio-ingressgateway-workload "HTTP/1.1 200 OK"
unit-istio-pilot-0: 12:19:00 ERROR unit.istio-pilot/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):                                                                                                                                                                                                                                                        
  File "./src/charm.py", line 1224, in <module>                                                                                                                                                                                                                                           
    main(Operator)                                                                                                                                                                                                                                                                        
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 429, in main                                                   
    charm = charm_class(framework)                                    
  File "./src/charm.py", line 116, in __init__                        
    cert_subject=self._cert_subject,                                  
  File "./src/charm.py", line 513, in _cert_subject                   
    svc_address = _get_gateway_address_from_svc(svc)                  
  File "./src/charm.py", line 1078, in _get_gateway_address_from_svc                                                                         
    gateway_address = _get_address_from_loadbalancer(svc)             
  File "./src/charm.py", line 1093, in _get_address_from_loadbalancer                                                                        
    if len(ingresses) != 1:                                           
TypeError: object of type 'NoneType' has no len()                     
unit-istio-pilot-0: 12:19:01 ERROR juju.worker.uniter.operation hook "start" (via hook dispatching script: dispatch) failed: exit status 1   
unit-istio-pilot-0: 12:19:01 INFO juju.worker.uniter awaiting error resolution for "start" hook                                              
unit-istio-pilot-0: 12:19:02 INFO juju.worker.uniter awaiting error resolution for "start" hook                                              
unit-istio-pilot-0: 12:19:10 INFO juju.worker.uniter awaiting error resolution for "start" hook                                              
unit-istio-pilot-0: 12:19:10 INFO unit.istio-pilot/0.juju-log Running legacy hooks/start.                                                    
unit-istio-pilot-0: 12:19:11 INFO unit.istio-pilot/0.juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/istio-ingressgateway-workload "HTTP/1.1 404 Not Found"
unit-istio-pilot-0: 12:19:11 INFO unit.istio-pilot/0.juju-log Could not retrieve the gateway service address for using in the CSR.           
unit-istio-pilot-0: 12:19:11 WARNING unit.istio-pilot/0.juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
unit-istio-pilot-0: 12:19:11 WARNING unit.istio-pilot/0.juju-log Invalid Grafana dashboards folder at /var/lib/juju/agents/unit-istio-pilot-0/charm/src/grafana_dashboards: directory does not exist
unit-istio-pilot-0: 12:19:11 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)                      
unit-istio-pilot-0: 12:19:11 INFO unit.istio-pilot/0.juju-log istio-pilot:1: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/istio-ingressgateway-workload "HTTP/1.1 404 Not Found"
unit-istio-pilot-0: 12:19:11 INFO unit.istio-pilot/0.juju-log istio-pilot:1: Could not retrieve the gateway service address for using in the CSR.
unit-istio-pilot-0: 12:19:11 WARNING unit.istio-pilot/0.juju-log istio-pilot:1: 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
unit-istio-pilot-0: 12:19:11 WARNING unit.istio-pilot/0.juju-log istio-pilot:1: Invalid Grafana dashboards folder at /var/lib/juju/agents/unit-istio-pilot-0/charm/src/grafana_dashboards: directory does not exist
unit-istio-pilot-0: 12:19:12 INFO unit.istio-pilot/0.juju-log istio-pilot:1: No ingress-auth data found - deleting any existing EnvoyFilter  
unit-istio-pilot-0: 12:19:12 INFO unit.istio-pilot/0.juju-log istio-pilot:1: HTTP Request: DELETE https://10.152.183.1/apis/networking.istio.io/v1alpha3/namespaces/kubeflow/envoyfilters/istio-pilot-authn-filter "HTTP/1.1 404 Not Found"
unit-istio-pilot-0: 12:19:12 ERROR unit.istio-pilot/0.juju-log istio-pilot:1: Uncaught exception while in charm code:
Traceback (most recent call last):                                    
  File "./src/charm.py", line 1224, in <module>                       
    main(Operator)                                                    
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 441, in main                                                   
    _emit_charm_event(charm, dispatcher.event_name)                   
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 149, in _emit_charm_event                                      
    event_to_emit.emit(*args, **kwargs)                               
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 344, in emit                                              
    framework._emit(event)                                            
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 841, in _emit                                             
    self._reemit(event_path)                                          
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 930, in _reemit                                           
    custom_handler(event)                                             
  File "./src/charm.py", line 358, in reconcile                       
    self._reconcile_ingress_auth(ingress_auth_data)                   
  File "./src/charm.py", line 762, in _reconcile_ingress_auth                                                                                
    _remove_envoyfilter(name=envoyfilter_name, namespace=self.model.name)                                                                    
  File "./src/charm.py", line 1118, in _remove_envoyfilter            
    lightkube_client.delete(ENVOYFILTER_LIGHTKUBE_RESOURCE, name=name, namespace=namespace)                                                  
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/lightkube/core/client.py", line 86, in delete                                     
    return self._client.request("delete", res=res, name=name, namespace=namespace, params={                                                  
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/lightkube/core/generic_client.py", line 245, in request                           
    return self.handle_response(method, resp, br)                     
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/lightkube/core/generic_client.py", line 196, in handle_response                   
    self.raise_for_status(resp)                                       
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/lightkube/core/generic_client.py", line 190, in raise_for_status                  
    raise transform_exception(e)                                      
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/lightkube/core/generic_client.py", line 188, in raise_for_status                  
    resp.raise_for_status()                                           
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/httpx/_models.py", line 749, in raise_for_status                                  
    raise HTTPStatusError(message, request=request, response=self)                                                                           
httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://10.152.183.1/apis/networking.istio.io/v1alpha3/namespaces/kubeflow/envoyfilters/istio-pilot-authn-filter'
For more information check: https://httpstatuses.com/404              
unit-istio-pilot-0: 12:19:12 ERROR juju.worker.uniter.operation hook "istio-pilot-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
sombrafam commented 1 month ago

I was hitting same issue during testing another charm, which require isto-pilot as requirement. How I fixed it:

juju exec --unit istio-pilot/0 -- ./istioctl uninstall -y --purge
juju destroy-model --destroy-storage --force kubeflow
# running integration tests again
tox -e integration -- --model kubeflow --keep-models

However, I believe that istio-pilot should run uninstall command during remove hook to properly clean up after itself.

I believe that this should also be added to the charm install hooks, since if you try to install twice in the same cluster it will fail the second time.