Closed longxb040 closed 2 years ago
Is there a problem with my gnocchi configuration. Looking forward to your reply. Thanks.
Is your redis setup in some bizzare way?
Your IndexError
is really weird. Can you show your archive policies?
Do you have any way to reproduce this?
Hello, jd. Thank you for your relay. This is my archive policies. [root@gz1-eden-control-001 ~(keystone_admin)]# gnocchi archive-policy show high
+---------------------+-------------------------------------------------------------------+
| Field | Value |
+---------------------+-------------------------------------------------------------------+
| aggregation_methods | 95pct, mean |
| back_window | 0 |
| definition | - points: 1440, granularity: 0:00:15, timespan: 6:00:00 |
| | - points: 1440, granularity: 0:01:00, timespan: 1 day, 0:00:00 |
| | - points: 1460, granularity: 6:00:00, timespan: 365 days, 0:00:00 |
| | - points: 1440, granularity: 0:07:00, timespan: 7 days, 0:00:00 |
| | - points: 1440, granularity: 0:30:00, timespan: 30 days, 0:00:00 |
| name | high |
+---------------------+-------------------------------------------------------------------+
[root@gz1-eden-control-001 ~(keystone_admin)]# rpm -qa | grep redis python-redis-2.10.3-1.el7.noarch redis-3.2.8-1.el7.x86_64
The version of openstack is Mitaka, and the version of ceilometer is 6.1.3 . For compatibility with gnocchi 4.2.4, i changed this code to make sure that the resource id is the same as before. /usr/lib/python2.7/site-packages/gnocchi/utils.py
def ResourceUUID(value, creator):
if isinstance(value, uuid.UUID):
return value
if '/' in value:
raise ValueError("'/' is not supported in resource id")
try:
try:
return uuid.UUID(value)
except ValueError:
if len(value) <= 255:
if six.PY2:
value = value.encode('utf-8')
return uuid.uuid5(RESOURCE_ID_NAMESPACE, value)
raise ValueError(
'transformable resource id >255 max allowed characters')
except Exception as e:
raise ValueError(e)
Is this the cause?
I don't think that your change is related.
How many node are running Gnocchi? Only one?
There's only one gnocchi node.
So just to be clear you have two problems:
2018-07-16 00:52:21,199 [146006] ERROR gnocchi.cli.metricd: Unexpected error updating the task partitioner: Unknown node
0256d3984057.6.af54813b-31d6-4509-91c3-207ab38`IndexError: index 4707715725467320320 is out of bounds for axis 1 with size 22
First error would indicate something like Redis restarted and cleaned while metricd was running. Can you start fresh to see if it's gone?
Second error is hard to understand if you don't provide the data of the metric: the current content and what might be in the measure storage for this metric.
There are three types of resource types: instance_network_interface, instance, instance_disk. They each have the following metric names:
[root@gd1-qy-controller-001 ~(keystone_admin)]# gnocchi resource show 0d2100aa-3eaf-4893-89b2-f5775104109b
+-----------------------+--------------------------------------------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------------------------------------------+
| created_by_project_id | 396863cc74234b4699de27aad8c1bba9 |
| created_by_user_id | 9f5551966a194b9bb0546fae57622d0b |
| creator | 9f5551966a194b9bb0546fae57622d0b:396863cc74234b4699de27aad8c1bba9 |
| ended_at | None |
| id | 0d2100aa-3eaf-4893-89b2-f5775104109b |
| metrics | cpu.delta: a0fa73ca-6015-4e4b-9f6d-12a7f34873d8 |
| | cpu_util: 70e15753-9bac-4292-8481-8381602cb5ef |
| | devops.memory.usage: e20dc2b2-c316-4f34-abb3-8e7c3a47e6b2 |
| | devops.partinode.total_inode.:data: f1ab7bee-f8d2-4ed6-8e0f-04d9693db5ae |
| | devops.partinode.total_inode.root: 1b504531-d166-40e7-9826-27bd524656e4 |
| | devops.partinode.used_inode.:data: afcda795-db82-4f01-9415-f6f9c5971228 |
| | devops.partinode.used_inode.root: e0cbf9b7-dfde-427e-ba70-53f49c66a41d |
| | devops.partsize.total_size.:data: 0e503868-236a-44cb-8cf2-89f8ce33dc91 |
| | devops.partsize.total_size.root: d91a6c26-c423-43e0-bea5-3797eef64d87 |
| | devops.partsize.used_size.:data: b9b5e02c-f99c-43d7-82c1-a7912d840ea0 |
| | devops.partsize.used_size.root: cf6ec443-232f-464a-aa24-869483073bd9 |
| | disk.read.bytes.rate: 341a56e4-01eb-4b92-921b-116576421779 |
| | disk.read.requests.rate: a330d72b-bc8d-48eb-92aa-1d0f884d70b6 |
| | disk.write.bytes.rate: dae48451-3c47-4304-98e8-97287322d2eb |
| | disk.write.requests.rate: e1b70c8d-d92d-4606-9171-4d2fd53493b9 |
| | memory.resident: bacb72ee-8f76-4b96-87fb-b9bb0df8f6db |
| | memory.usage: 3eed6a5b-03cf-4aa0-ad57-9981ddb0ee56 |
| original_resource_id | 0d2100aa-3eaf-4893-89b2-f5775104109b |
| project_id | fdfdddd8ad37468ab6929bc1aa3803e6 |
| revision_end | None |
| revision_start | 2018-06-28T09:59:00.786400+00:00 |
| started_at | 2018-06-28T09:59:00.786367+00:00 |
| type | instance |
| user_id | 1a0103e7afab445cacd2802701286b9e |
+-----------------------+--------------------------------------------------------------------------+
[root@gd1-qy-controller-001 ~(keystone_admin)]# gnocchi resource show 0037db6c-4d29-5fb1-80f5-e6fc8137f4aa
+-----------------------+-----------------------------------------------------------------------+
| Field | Value |
+-----------------------+-----------------------------------------------------------------------+
| created_by_project_id | 396863cc74234b4699de27aad8c1bba9 |
| created_by_user_id | 9f5551966a194b9bb0546fae57622d0b |
| creator | 9f5551966a194b9bb0546fae57622d0b:396863cc74234b4699de27aad8c1bba9 |
| ended_at | None |
| id | 0037db6c-4d29-5fb1-80f5-e6fc8137f4aa |
| metrics | network.incoming.bytes.rate: e24f65ef-029b-4ac7-88ff-98a44d432a5e |
| | network.incoming.packets.rate: 251d7810-92ce-4316-91d3-ffcd2b67e3e0 |
| | network.outgoing.bytes.rate: c684977b-cc43-4f53-88b4-79994af0c961 |
| | network.outgoing.packets.rate: 4e5f68fa-b5e7-4e4a-8ebd-62f60954c55c |
| original_resource_id | instance-00000022-40724a94-cffa-4e02-a54b-9e5cc11bff82-tapb68b7976-14 |
| project_id | 6359e1a306a64677a775d7db13750215 |
| revision_end | None |
| revision_start | 2018-06-28T10:05:14.855223+00:00 |
| started_at | 2018-06-28T10:05:14.855198+00:00 |
| type | instance_network_interface |
| user_id | 6baa87f84a2d4317b866e49df35e7da8 |
+-----------------------+-----------------------------------------------------------------------+
[root@gd1-qy-controller-001 ~(keystone_admin)]# gnocchi resource show 5065c6c2-fc06-5471-94c5-7da7a3eb17c0
+-----------------------+-----------------------------------------------------------------------+
| Field | Value |
+-----------------------+-----------------------------------------------------------------------+
| created_by_project_id | 396863cc74234b4699de27aad8c1bba9 |
| created_by_user_id | 754a340422ff4d6dbeb7e361468ee55a |
| creator | 754a340422ff4d6dbeb7e361468ee55a:396863cc74234b4699de27aad8c1bba9 |
| ended_at | None |
| id | 5065c6c2-fc06-5471-94c5-7da7a3eb17c0 |
| metrics | disk.device.read.bytes.rate: e9a822d5-0f9c-4cbe-8c94-7c5e4618cd44 |
| | disk.device.read.requests.rate: abf47aa5-59d1-40be-b140-051c5eb3fd11 |
| | disk.device.write.bytes.rate: 3ea776ea-4628-47a8-b9df-8a3eab920d15 |
| | disk.device.write.requests.rate: 7c0be1bc-54c0-47e5-9025-0715706854f3 |
| original_resource_id | bd386e59-e452-479f-9d1c-26611b0b454b-vda |
| project_id | 6e7849b0791b40aa8df6daf71efaec09 |
| revision_end | None |
| revision_start | 2018-06-28T06:19:21.691439+00:00 |
| started_at | 2018-06-28T06:19:21.691381+00:00 |
| type | instance_disk |
| user_id | 4f003a778cdc411f86c5aa722211851f |
+-----------------------+-----------------------------------------------------------------------+
Metric data is a float type.
That does not gives us the actual data of the metric. Do a measures show on the metric that fails and a dump of the measures objects from Redis itself.
@chungg @sileht You're more familiar with the quantile code, if you have any idea of what could cause that.
i can try looking at this after work. i really hope i didn't f this up :(
yeah, the index is super weird. how did it get such a large index if the length is only 22... that index would mean the array is thousands of petabytes big :|
maybe if you try logging/printing self.counts
and self.indexes
before:
values = (
self._ts['values'][ordered][floor_pos] * (ceil_pos - real_pos) +
self._ts['values'][ordered][ceil_pos] * (real_pos - floor_pos))
Hello, because of the large number of metric, I listed one of them. This is its metric data:
[root@gd-gz02-control-001 ~(keystone_admin)]# gnocchi measures show ef2a6b0e-398d-459b-b1a1-bd75cd733427
+---------------------------+-------------+-----------------+
| timestamp | granularity | value |
+---------------------------+-------------+-----------------+
| 2018-03-12T00:00:00+00:00 | 21600.0 | 0.0944074671545 |
| 2018-03-12T00:00:00+00:00 | 1800.0 | 0.0938001045818 |
| 2018-03-12T04:00:00+00:00 | 1800.0 | 0.107636420373 |
| 2018-03-12T04:30:00+00:00 | 1800.0 | 0.0932090543916 |
| 2018-03-12T05:00:00+00:00 | 1800.0 | 0.0954996360816 |
| 2018-03-11T23:54:00+00:00 | 420.0 | 0.0938001045818 |
| 2018-03-12T04:27:00+00:00 | 420.0 | 0.1040357138 |
| 2018-03-12T04:34:00+00:00 | 420.0 | 0.100795033405 |
| 2018-03-12T04:41:00+00:00 | 420.0 | 0.0910234547819 |
| 2018-03-12T04:48:00+00:00 | 420.0 | 0.0862284984593 |
| 2018-03-12T04:55:00+00:00 | 420.0 | 0.0894921532808 |
| 2018-03-12T05:02:00+00:00 | 420.0 | 0.0954996360816 |
| 2018-03-12T00:00:00+00:00 | 60.0 | 0.0938001045818 |
| 2018-03-12T04:27:00+00:00 | 60.0 | 0.107636420373 |
| 2018-03-12T04:30:00+00:00 | 60.0 | 0.0932335940819 |
| 2018-03-12T04:33:00+00:00 | 60.0 | 0.107636420373 |
| 2018-03-12T04:34:00+00:00 | 60.0 | 0.100795033405 |
| 2018-03-12T04:38:00+00:00 | 60.0 | 0.100795033405 |
| 2018-03-12T04:41:00+00:00 | 60.0 | 0.0910234547819 |
| 2018-03-12T04:43:00+00:00 | 60.0 | 0.0910234547819 |
| 2018-03-12T04:48:00+00:00 | 60.0 | 0.0858752474672 |
| 2018-03-12T04:53:00+00:00 | 60.0 | 0.0865817494513 |
| 2018-03-12T04:55:00+00:00 | 60.0 | 0.0874896590139 |
| 2018-03-12T04:58:00+00:00 | 60.0 | 0.0874896590139 |
| 2018-03-12T05:00:00+00:00 | 60.0 | 0.0954996360816 |
| 2018-03-12T05:02:00+00:00 | 60.0 | 0.0954996360816 |
| 2018-03-12T05:03:00+00:00 | 60.0 | 0.0942641381421 |
| 2018-03-12T05:08:00+00:00 | 60.0 | 0.096735134021 |
| 2018-03-12T00:00:00+00:00 | 15.0 | 0.0938001045818 |
| 2018-03-12T04:27:00+00:00 | 15.0 | 0.107636420373 |
| 2018-03-12T04:30:00+00:00 | 15.0 | 0.0932335940819 |
| 2018-03-12T04:33:00+00:00 | 15.0 | 0.107636420373 |
| 2018-03-12T04:33:45+00:00 | 15.0 | 0.107636420373 |
| 2018-03-12T04:34:00+00:00 | 15.0 | 0.100795033405 |
| 2018-03-12T04:38:00+00:00 | 15.0 | 0.100795033405 |
| 2018-03-12T04:38:45+00:00 | 15.0 | 0.100795033405 |
| 2018-03-12T04:41:00+00:00 | 15.0 | 0.0910234547819 |
| 2018-03-12T04:43:00+00:00 | 15.0 | 0.0910234547819 |
| 2018-03-12T04:43:45+00:00 | 15.0 | 0.0910234547819 |
| 2018-03-12T04:48:00+00:00 | 15.0 | 0.0858752474672 |
| 2018-03-12T04:48:45+00:00 | 15.0 | 0.0858752474672 |
| 2018-03-12T04:53:00+00:00 | 15.0 | 0.0865817494513 |
| 2018-03-12T04:53:45+00:00 | 15.0 | 0.0865817494513 |
| 2018-03-12T04:55:00+00:00 | 15.0 | 0.0874896590139 |
| 2018-03-12T04:58:00+00:00 | 15.0 | 0.0874896590139 |
| 2018-03-12T04:58:45+00:00 | 15.0 | 0.0874896590139 |
| 2018-03-12T05:00:00+00:00 | 15.0 | 0.0954996360816 |
| 2018-03-12T05:02:00+00:00 | 15.0 | 0.0954996360816 |
| 2018-03-12T05:03:00+00:00 | 15.0 | 0.0942641381421 |
| 2018-03-12T05:03:45+00:00 | 15.0 | 0.0942641381421 |
| 2018-03-12T05:08:00+00:00 | 15.0 | 0.096735134021 |
| 2018-03-12T05:08:45+00:00 | 15.0 | 0.096735134021 |
+---------------------------+-------------+-----------------+
I try to print self.counts and self.indexes. I made the following modifications to the code, but it doesn't print the values of self.counts and self.indexes.
import daiquiri
LOG = daiquiri.getLogger(__name__)
def quantile(self, q):
LOG.info("self.indexes: %s, self.counts: %s" % (self.indexes, self.counts))
ordered = numpy.lexsort((self._ts['values'], self.indexes))
min_pos = numpy.cumsum(self.counts) - self.counts
real_pos = min_pos + (self.counts - 1) * (q / 100)
floor_pos = numpy.floor(real_pos).astype(numpy.int, copy=False)
ceil_pos = numpy.ceil(real_pos).astype(numpy.int, copy=False)
values = (
self._ts['values'][ordered][floor_pos] * (ceil_pos - real_pos) +
self._ts['values'][ordered][ceil_pos] * (real_pos - floor_pos))
# NOTE(gordc): above code doesn't compute proper value if pct lands on
# exact index, it sets it to 0. we need to set it properly here
exact_pos = numpy.equal(floor_pos, ceil_pos)
values[exact_pos] = self._ts['values'][ordered][floor_pos][exact_pos]
return make_timeseries(self.tstamps, values)
so it throws traceback but doesn't log anything? alternatively, you could try/except the exception, log, and re-raise the error (so it doesn't proceed to store any corrupt data)
Hello, chungg. I got these logs in your way.
2018-07-19 16:37:37,662 [63] ERROR gnocchi.carbonara: self.indexes: ['2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
'2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
'2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
'2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
'2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'], self.counts: [10]. index 4746790760253227008 is out of bounds for axis 1 with size 10
2018-07-19 16:37:37,666 [63] ERROR gnocchi.storage: Error processing new measures
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 505, in process_new_measures
self._compute_and_store_timeseries(metric, measures)
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 580, in _compute_and_store_timeseries
before_truncate_callback=_map_add_measures)
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 348, in set_values
before_truncate_callback(self)
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 576, in _map_add_measures
for aggregation in agg_methods))
File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in parallel_map
return list(executor.map(lambda args: fn(*args), list_of_args))
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 641, in result_iterator
yield fs.pop().result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, in result
return self.__get_result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 63, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in <lambda>
return list(executor.map(lambda args: fn(*args), list_of_args))
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 341, in _add_measures
grouped_serie, ap_def.granularity, aggregation_to_compute)
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 620, in from_grouped_serie
q))
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 743, in _resample_grouped
return agg_func(q) if agg_name == 'quantile' else agg_func()
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 201, in quantile
raise Exception, ex
IndexError: index 4746790760253227008 is out of bounds for axis 1 with size 10
2018-07-19 16:47:56,672 [63] ERROR gnocchi.carbonara: self.indexes: ['2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
'2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
'2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
'2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
'2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'], self.counts: [10]. index 4746790760253227008 is out of bounds for axis 1 with size 10
2018-07-19 16:47:56,674 [63] ERROR gnocchi.storage: Error processing new measures
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 505, in process_new_measures
self._compute_and_store_timeseries(metric, measures)
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 580, in _compute_and_store_timeseries
before_truncate_callback=_map_add_measures)
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 348, in set_values
before_truncate_callback(self)
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 576, in _map_add_measures
for aggregation in agg_methods))
File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in parallel_map
return list(executor.map(lambda args: fn(*args), list_of_args))
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 641, in result_iterator
yield fs.pop().result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, in result
return self.__get_result()
File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 63, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in <lambda>
return list(executor.map(lambda args: fn(*args), list_of_args))
File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 341, in _add_measures
grouped_serie, ap_def.granularity, aggregation_to_compute)
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 620, in from_grouped_serie
q))
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 743, in _resample_grouped
return agg_func(q) if agg_name == 'quantile' else agg_func()
File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 201, in quantile
raise Exception, ex
IndexError: index 4746790760253227008 is out of bounds for axis 1 with size 10
hmmm... that shouldn't break anything... when i use the logged data in the code it runs fine using numpy 1.14.2.
maybe try logging the rest of the information? like ordered
, floor_pos
and ceil_pos
?
i wonder if it's something related to threading? i hope not, but you could try disabling threading by setting parallel_operations = 1
I've disabled multithreading by setting parallel_operations = 1. It has been observed for a day without any more IndexError. But does it affect performance?
It just means some operations won't be parallelized. With Redis that should only have a low impact.
@chungg do you have any idea what might be thread unsafe?
Yes, at present there is no message blocking at rabbitmq.
how frequent were you getting the IndexError previously?
@jd i have no idea. from a quick glance, it doesn't seem like any of the aggregations manipulate the input arrays.
@chungg The frequency is not regular, but it will certainly appear without setting parallel_operations in 4 hours. I tried to set parallel_operations = 8, the period of occurrence of IndexError is longer than without setting parallel_operations parameter. It seems that the smaller the value of parallel_operations , the longer the period is.
@longxb040 to clarify,
parallel_operations
, the more frequent you see the IndexError?parallel_operations
to 1, you don't see IndexError at all?i do find this very strange especially the large index value it throws. i don't know what could cause it to randomly throw this.
@chungg Sorry, my previous description is not clear . The higher the value of parallel_operations, the lesser frequent the IndexError. After setting parallel_operations to 1, no IndexError is currently seen
i'm still wondering what is setting that index value. it would be good to know whether that index comes from:ordered
, min_pos
, real_pos
, floor_pos
or ceil_pos
. it seems strange that threading would have an impact on quantile calculations.
Hello, everyone. I have a problem. I deployed gnocchi to the docker container, it's at version 4.2.4. incoming driver and storage driver is all redis. This is my gnocchi configuration.
gnocchi.conf
However, these errors are reported in the gnocchi logs: