gnocchixyz / gnocchi

Timeseries database
Apache License 2.0
299 stars 85 forks source link

Gnocchi-api returning test/html instead of application/json thus python-gnocchiclient not casting exception #1261

Closed frittentheke closed 2 years ago

frittentheke commented 2 years ago

Which version of Gnocchi are you using

gnocchi 7.0.6

How to reproduce your problem

I just ran into an issue when playing with OpenStack Ceilometer with Gnocchi as backend on a DevStack running stable/yoga on Ubuntu 20.04 (focal)

Ceilometer is "ensuring" certain archive policy exist, see https://github.com/openstack/ceilometer/blob/bf263b11181f4e44850e991282766b3bdf4f41e1/ceilometer/publisher/gnocchi.py#L269.

This is done via python-gnocchiclient and the create method at https://github.com/gnocchixyz/python-gnocchiclient/blob/7355fb2d7d3311f5962230a565574ce8c76c1caa/gnocchiclient/v1/archive_policy.py#L35.

which then sends a POST to the create REST API endpoint of Gnocchi at https://github.com/gnocchixyz/gnocchi/blob/f124cf76479720d3b9caf439241da12b2974ac90/gnocchi/rest/api.py#L302.

The mentioned ensures_archives_policies method then checks for the "ArchivePolicyAlreadyExists" exception (https://github.com/openstack/ceilometer/blob/bf263b11181f4e44850e991282766b3bdf4f41e1/ceilometer/publisher/gnocchi.py#L274) and apparently this is the issue here - this exception is NOT thrown, but simply a generic 409 Conflict error.

I found that exceptions are created and then enriched via the https://github.com/gnocchixyz/python-gnocchiclient/blob/7355fb2d7d3311f5962230a565574ce8c76c1caa/gnocchiclient/exceptions.py#L217 method and in our case it should be https://github.com/gnocchixyz/python-gnocchiclient/blob/7355fb2d7d3311f5962230a565574ce8c76c1caa/gnocchiclient/exceptions.py#L161

But, when looking at a trace between the apache2 and the unix socket exposing the gnocci-api I saw something strange.

While apache transferred this via the socket to the UWSGI spawned gnocchi-api instance:

.7....proxy-sendchunked..1..APACHE_RUN_USER..ubuntu..APACHE_RUN_GROUP..ubuntu .PATH_INFO../v1/archive_policy/ .HTTP_HOST .192.168.0.100..HTTP_USER_AGENTW.ceilometer-agent-notification keystoneauth1/4.5.0 python-requests/2.27.1 CPython/3.8.10..HTTP_ACCEPT_ENCODING .gzip, deflate..HTTP_ACCEPT..application/json, /..HTTP_CONNECTION .keep-alive..CONTENT_TYPE..application/json..HTTP_X_AUTH_TOKEN..gAAAAABiscoSYYjDWLVJzL8V12knrSmvI5aRYFTH4cZyQ0HKPBrheWqbh-6PS6Cf56kNTuTw5hwhpYa6UNEXkA_aIYj2xk6hTchgFaBOSrIHq36opuo46ktJCXcsEBQnus1_IZnWwuy8RBlzso6LGKeo37dzT_zloUYj_XhaMRDlgUifeAgUESQ..CONTENT_LENGTH..245..PATHF./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin..SERVER_SIGNATUREJ.

Apache/2.4.41 (Ubuntu) Server at 192.168.0.100 Port 80
..SERVER_SOFTWARE..Apache/2.4.41 (Ubuntu)..SERVER_NAME .192.168.0.100..SERVER_ADDR .192.168.0.100..SERVER_PORT..80..REMOTE_ADDR .192.168.0.100 .DOCUMENT_ROOT../opt/stack/horizon/.blackhole/..REQUEST_SCHEME..http..CONTEXT_PREFIX....CONTEXT_DOCUMENT_ROOT../opt/stack/horizon/.blackhole/..SERVER_ADMIN..[no address given]..SCRIPT_FILENAME6.proxy:uwsgi://uwsgi-uds-gnocchi-api/v1/archive_policy/..REMOTE_PORT..42324..GATEWAY_INTERFACE..CGI/1.1..SERVER_PROTOCOL..HTTP/1.1..REQUEST_METHOD..POST..QUERY_STRING....REQUEST_URI../metric/v1/archive_policy/..SCRIPT_NAME../metric{"name":"ceilometer-high-rate","aggregation_methods":["mean","rate:mean"],"back_window":0,"definition":[{"granularity":"1 second","timespan":"1 hour"},{"granularity":"1 minute","timespan":"1 day"},{"granularity":"1 hour","timespan":"365 days"}]}

it then received:

HTTP/1.1 409 Conflict Content-Length: 228 Content-Type: text/html; charset=UTF-8 Connection: close

409 Conflict

409 Conflict

There was a conflict when trying to complete your request.

Archive policy ceilometer-high-rate already exists

which is NOT the requested content type of "application/json". Rather the gnocchi-api explicitly returns "Content-Type: text/html; charset=UTF-8" and a "human readable" error instead of the expected JSON.

And that is why the exception is not properly casted and we end up in this archive_policy creation loop. Why the response is HTML and if this is an error of UWGSI or Gnocchi I was not yet able to find out.

What is the result that you get

ceilometer-anotification.service contantly tries to create already existing policies due to python-gnocchiclient receiving the exception as HTML instead of JSON.

What is result that you expected

I would expect python-gnocchiclient to be first receive content-type json from gnocchi-api and then to be casting this into the specific exception that it knows.

mrunge commented 2 years ago

Kudos to your detailed report here (and on the openstack mailing list!

I wonder if you are seeing the same error as me; gnocchi refuses to store any metrics recently (returning 404 when the new metric is posted)

frittentheke commented 2 years ago

I found that https://github.com/gnocchixyz/python-gnocchiclient/pull/110 actually fixed the issue. And the reason this even affects stable/yoga and likely also other OpenStack releases is apparently due to the version 7.0.6 of python-gnocchiclient being pinned. https://opendev.org/openstack/requirements/src/commit/264c424c399b0ef931a729caa3aa76888e5727c7/upper-constraints.txt#L138.

An upgrade to 7.0.7 was reverted (https://github.com/openstack/requirements/commit/4d3d57ee0b88973b039e379b1da73c1086e9a0af) due to issue with gate jobs. I suppose this needs to be picked up again as otherwise the integration with Gnocchi is borked.

I wonder if you are seeing the same error as me; gnocchi refuses to store any metrics recently (returning 404 when the new metric is posted)

Can you elaborate a little more which requests return 404 for you?

mrunge commented 2 years ago

so, in devstack@ceilometer-anotification.service I'm getting a conflict

peline.sample Traceback (most recent call last):
peline.sample   File "/opt/stack/ceilometer/ceilometer/pipeline/sample.py", line 85, in publish_samples
peline.sample     p.publish_samples(samples)
peline.sample   File "/opt/stack/ceilometer/ceilometer/publisher/gnocchi.py", line 326, in publish_samples
peline.sample     self.ensures_archives_policies()
peline.sample   File "/opt/stack/ceilometer/ceilometer/publisher/gnocchi.py", line 273, in ensures_archives_policies
peline.sample     self._gnocchi.archive_policy.create(ap)
peline.sample   File "/usr/local/lib/python3.9/site-packages/gnocchiclient/v1/archive_policy.py", line 42, in create
peline.sample     return self._post(
peline.sample   File "/usr/local/lib/python3.9/site-packages/gnocchiclient/v1/base.py", line 41, in _post
peline.sample     return self.client.api.post(*args, **kwargs)
peline.sample   File "/usr/local/lib/python3.9/site-packages/keystoneauth1/adapter.py", line 401, in post
peline.sample     return self.request(url, 'POST', **kwargs)
peline.sample   File "/usr/local/lib/python3.9/site-packages/gnocchiclient/client.py", line 52, in request
peline.sample     raise exceptions.from_response(resp, method)
peline.sample gnocchiclient.exceptions.Conflict: <html>
peline.sample  <head>
peline.sample   <title>409 Conflict</title>
peline.sample  </head>
peline.sample  <body>
peline.sample   <h1>409 Conflict</h1>
peline.sample   There was a conflict when trying to complete your request.<br /><br />
peline.sample Archive policy ceilometer-low already exists

in devstack@gnocchi-api.service you see

112398]: 2022-06-23 10:57:02,340 [112398] DEBUG    gnocchi.rest.api: Aborting request. Code [409]. Details [Archive pol>
112398]: [pid: 112398|app: 0|req: 40/79] 192.168.122.117 () {66 vars in 1329 bytes} [Thu Jun 23 10:57:02 2022] POST /me>
112397]: 2022-06-23 11:02:02,802 [112397] DEBUG    gnocchi.rest.api: Aborting request. Code [400]. Details [{'cause': '>
112397]: [pid: 112397|app: 0|req: 40/80] 192.168.122.117 () {66 vars in 1419 bytes} [Thu Jun 23 11:02:02 2022] POST /me>
112398]: 2022-06-23 11:02:02,816 [112398] DEBUG    gnocchi.rest.api: Aborting request. Code [404]. Details [Resource a2>
112398]: [pid: 112398|app: 0|req: 41/81] 192.168.122.117 () {66 vars in 1437 bytes} [Thu Jun 23 11:02:02 2022] PATCH /m>
112397]: 2022-06-23 11:02:02,832 [112397] DEBUG    gnocchi.rest.api: Aborting request. Code [404]. Details [Resource f2>
112397]: [pid: 112397|app: 0|req: 41/82] 192.168.122.117 () {66 vars in 1437 bytes} [Thu Jun 23 11:02:02 2022] PATCH /m>
112398]: 2022-06-23 11:07:02,568 [112398] DEBUG    gnocchi.rest.api: Aborting request. Code [409]. Details [Archive pol>
112398]: [pid: 112398|app: 0|req: 42/83] 192.168.122.117 () {66 vars in 1329 bytes} [Thu Jun 23 11:07:02 2022] POST /me>

oh, btw, here is gnocchiclient 7.0.6 installed (devstack install from today)

mrunge commented 2 years ago

Let's see how https://review.opendev.org/c/openstack/requirements/+/847256 goes then

tobias-urdin commented 2 years ago

Thanks for the detailed report and for digging up the fix, let us know how it goes 👍

tobias-urdin commented 2 years ago

This can be closed as fixed by 7.0.7 based on no more feedback here?

mrunge commented 2 years ago

This can be closed now, it is fixed by 7.0.7