canonical / charm-openstack-service-checks

Collection of Nagios checks and other utilities that can be used to verify the operation of an OpenStack cluster
0 stars 2 forks source link

Openstack user "nagios" doesn't have permissions to run check_octavia.py #137

Open sudeephb opened 5 months ago

sudeephb commented 5 months ago

The openstack user stored in /var/lib/nagios/nagios.novarc doesn't seem to have enough privilege to run check_octavia.py

# Running the check manually
root@juju-8d8c5a-4-lxd-17:/etc/nagios/nrpe.d# /usr/local/lib/nagios/plugins/check_octavia.py --check amphorae
Traceback (most recent call last):
  File "/usr/local/lib/nagios/plugins/check_octavia.py", line 358, in <module>
    main()
  File "/usr/local/lib/nagios/plugins/check_octavia.py", line 352, in main
    status, message = process_checks(args)
  File "/usr/local/lib/nagios/plugins/check_octavia.py", line 293, in process_checks
    return nagios_exit(args, checks[args.check](connection))
  File "/usr/local/lib/nagios/plugins/check_octavia.py", line 203, in check_amphorae
    items = list(lb_mgr.amphorae())
  File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1693, in list
    exceptions.raise_from_response(response)
  File "/usr/lib/python3/dist-packages/openstack/exceptions.py", line 234, in raise_from_response
    raise cls(
openstack.exceptions.HttpException: HttpException: 403: Client Error for url: https://octavia.oam.prd.infra.client.net:9876/v2.0/octavia/amphorae, Forbidden

# Workaround for me was to give the load balancer roles manually to the nagios user
ubuntu@app1maas001p:~$ NAGIOS_USER_ID=$(openstack user list --domain service_domain | grep nagios | awk '{print $2}')
ubuntu@app1maas001p:~$ openstack role add --domain service_domain --user $NAGIOS_USER_ID load-balancer_member
ubuntu@app1maas001p:~$ openstack role add --project-domain service_domain --project services --user $NAGIOS_USER_ID load-balancer_member
ubuntu@app1maas001p:~$ openstack role add --domain service_domain --user $NAGIOS_USER_ID load-balancer_admin
ubuntu@app1maas001p:~$ openstack role add --project-domain service_domain --project services --user $NAGIOS_USER_ID load-balancer_admin

Imported from Launchpad using lp2gh.

sudeephb commented 5 months ago

(by phausman) Actually only load-balancer_admin role is required. Alternatively, nagios user should be a system admin and requests should be made in system scope (instead of project scope). See related policies for Octavia's /octavia/amphorae API below:

"system-admin": "role:admin and system_scope:all" "load-balancer:admin": "is_admin:True or role:load-balancer_admin or rule:system-admin" "os_load-balancer_api:amphora:get_all": "rule:load-balancer:admin"

sudeephb commented 5 months ago

(by eric-chen)

I am worry that to provide load-balancer_admin permission to nagios user. Does that mean nagios user can modify/create/delete the load balancer? Could we find the least privilege to run check_octavia.py ?

https://www.beyondtrust.com/blog/entry/what-is-least-privilege

sudeephb commented 5 months ago

(by txiao) I have re-run tests and confirmed that the load-balancer_admin is already the least-privileged role we can give to the nagios user. As for the safety concerns, based on OpenStack team's previous responses (1) to the security of Octavia APIs, it seems that the risk is very low.

sudeephb commented 5 months ago

(by eric-chen)

We need to review all the actions that can be done after we apply load-balancer_admin to nagios. Could nagios delete/create/modify the load-balancer after we apply the role? If yes, then it is not the least-privileged. We can discuss it offline.

Thanks for providing the two links. However, it seems not related the risk if we provide bigger permission to nagios user.

sudeephb commented 5 months ago

(by yoshikadokawa) I'm still seeing the same with openstack-service-checks charm from latest/edge channel (rev 33) on Yoga-Jammy.

$ /usr/local/lib/nagios/plugins/check_octavia.py --check loadbalancers Traceback (most recent call last): File "/usr/local/lib/nagios/plugins/check_octavia.py", line 308, in main() File "/usr/local/lib/nagios/plugins/check_octavia.py", line 302, in main status, message = process_checks(args) File "/usr/local/lib/nagios/plugins/check_octavia.py", line 243, in process_checks return nagios_exit(args, checksargs.check) File "/usr/local/lib/nagios/plugins/check_octavia.py", line 97, in check_loadbalancers lb_enabled = [lb for lb in lb_all if lb.is_admin_state_up] File "/usr/local/lib/nagios/plugins/check_octavia.py", line 97, in lb_enabled = [lb for lb in lb_all if lb.is_admin_state_up] File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1775, in list exceptions.raise_from_response(response) File "/usr/lib/python3/dist-packages/openstack/exceptions.py", line 236, in raise_from_response raise cls( openstack.exceptions.HttpException: HttpException: 403: Client Error for url: https://octavia.endpoint:9876/v2.0/lbaas/loadbalancers, Policy does not allow this request to be performed.

$ /usr/local/lib/nagios/plugins/check_octavia.py --check pools Traceback (most recent call last): File "/usr/local/lib/nagios/plugins/check_octavia.py", line 308, in main() File "/usr/local/lib/nagios/plugins/check_octavia.py", line 302, in main status, message = process_checks(args) File "/usr/local/lib/nagios/plugins/check_octavia.py", line 243, in process_checks return nagios_exit(args, checksargs.check) File "/usr/local/lib/nagios/plugins/check_octavia.py", line 153, in check_pools pools_enabled = [pool for pool in pools_all if pool.is_admin_state_up] File "/usr/local/lib/nagios/plugins/check_octavia.py", line 153, in pools_enabled = [pool for pool in pools_all if pool.is_admin_state_up] File "/usr/lib/python3/dist-packages/openstack/resource.py", line 1775, in list exceptions.raise_from_response(response) File "/usr/lib/python3/dist-packages/openstack/exceptions.py", line 236, in raise_from_response raise cls( openstack.exceptions.HttpException: HttpException: 403: Client Error for url: https://octavia.endpoint:9876/v2.0/lbaas/pools, Policy does not allow this request to be performed.

sudeephb commented 5 months ago

(by fandanbango) This was confirmed at least twice on Jammy/Yoga environments already. Not sure is the exact same issue but it looks like it.

sudeephb commented 5 months ago

(by aieri) yoshikadokawa, fandanbango: the loadbalancers endpoint is different from the amphorae one. It should be sufficient to grant nagios the load-balancer_global_observer role. See https://docs.openstack.org/octavia/latest/configuration/policy.html#default-octavia-policies-api-effective-rules

sudeephb commented 5 months ago

(by mastier1) I can confirm that fixes the issue, the question is, can we incorporate that in nagios charm for instance

openstack role add --user-domain service_domain --user nagios --project-domain service_domain --project services load-balancer_global_observer

sudeephb commented 5 months ago

(by eric-chen) We will migrate LMA to COS soon. Therefore, we won't maintain/modify nagios charm anymore. What we can do it to update the document in short term is to update the documentation of charm-openstack-service-check. For long term, we should collect metrics from openstack-exporter and create related alert from prometheus and alert manager.