gnocchixyz / gnocchi

Timeseries database
Apache License 2.0
299 stars 85 forks source link

IndexError: index 4707715725467320320 is out of bounds for axis 1 with size 22 #933

Closed longxb040 closed 2 years ago

longxb040 commented 6 years ago

Hello, everyone. I have a problem. I deployed gnocchi to the docker container, it's at version 4.2.4. incoming driver and storage driver is all redis. This is my gnocchi configuration.

gnocchi.conf

[DEFAULT]
log_dir = /var/log/gnocchi
debug = false
verbose = false
coordination_url = redis://qiyun-demo.vipstack.net:6379
[api]
auth_mode = keystone
[indexer]
url = mysql+pymysql://gnocchi:xxx@qiyun-demo.vipstack.net/gnocchi?charset=utf8
[storage]
driver = redis
redis_url = redis://qiyun-demo.vipstack.net:6379?db=10
[incoming]
driver = redis
redis_url = redis://qiyun-demo.vipstack.net:6379?db=11
[keystone_authtoken]
www_authenticate_uri=http://qiyun-demo.vipstack.net:5000/v2.0
identity_uri=http://qiyun-demo.vipstack.net:35357
memcached_servers = qiyun-demo.vipstack.net:11211
token_cache_time = 300
revocation_cache_time = 10
service_token_roles_required = true
admin_user=gnocchi
admin_password=xxx
admin_tenant_name=services
auth_version=v2.0
[metricd]
workers = 8
[statsd]
resource_id = 5e3fcbe2-7aab-475d-b42c-a440aa42e5ad
user_id = e0ca4711-1128-422c-abd6-62db246c32e7
project_id = af0c88e8-90d8-4795-9efe-57f965e67318
archive_policy_name = high
flush_delay = 10

However, these errors are reported in the gnocchi logs:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 505, in process_new_measures
    self._compute_and_store_timeseries(metric, measures)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 580, in _compute_and_store_timeseries
    before_truncate_callback=_map_add_measures)
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 344, in set_values
    before_truncate_callback(self)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 576, in _map_add_measures
    for aggregation in agg_methods))
  File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in parallel_map
    return list(executor.map(lambda args: fn(*args), list_of_args))
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 641, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, in result
    return self.__get_result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 63, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in <lambda>
    return list(executor.map(lambda args: fn(*args), list_of_args))
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 341, in _add_measures
    grouped_serie, ap_def.granularity, aggregation_to_compute)
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 616, in from_grouped_serie
    q))
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 739, in _resample_grouped
    return agg_func(q) if agg_name == 'quantile' else agg_func()
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 192, in quantile
    self._ts['values'][ordered][ceil_pos] * (real_pos - floor_pos))
IndexError: index 4707715725467320320 is out of bounds for axis 1 with size 22
2018-07-16 00:52:11,582 [1345864] ERROR    gnocchi.cli.metricd: Unexpected error updating the task partitioner: Unknown node `0256d3984057.6.af54813b-31d6-4509-91c3-207ab3
818373'
2018-07-16 00:52:12,529 [1397669] ERROR    gnocchi.cli.metricd: Unexpected error updating the task partitioner: Unknown node `0256d3984057.6.af54813b-31d6-4509-91c3-207ab3
818373'
2018-07-16 00:52:21,199 [146006] ERROR    gnocchi.cli.metricd: Unexpected error updating the task partitioner: Unknown node `0256d3984057.6.af54813b-31d6-4509-91c3-207ab38
18373'
2018-07-16 05:35:07,865 [146006] ERROR    gnocchi.storage: Error processing new measures
longxb040 commented 6 years ago

Is there a problem with my gnocchi configuration. Looking forward to your reply. Thanks.

jd commented 6 years ago

Is your redis setup in some bizzare way?

Your IndexError is really weird. Can you show your archive policies? Do you have any way to reproduce this?

longxb040 commented 6 years ago

Hello, jd. Thank you for your relay. This is my archive policies. [root@gz1-eden-control-001 ~(keystone_admin)]# gnocchi archive-policy show high

+---------------------+-------------------------------------------------------------------+
| Field               | Value                                                             |
+---------------------+-------------------------------------------------------------------+
| aggregation_methods | 95pct, mean                                                       |
| back_window         | 0                                                                 |
| definition          | - points: 1440, granularity: 0:00:15, timespan: 6:00:00           |
|                     | - points: 1440, granularity: 0:01:00, timespan: 1 day, 0:00:00    |
|                     | - points: 1460, granularity: 6:00:00, timespan: 365 days, 0:00:00 |
|                     | - points: 1440, granularity: 0:07:00, timespan: 7 days, 0:00:00   |
|                     | - points: 1440, granularity: 0:30:00, timespan: 30 days, 0:00:00  |
| name                | high                                                              |
+---------------------+-------------------------------------------------------------------+

[root@gz1-eden-control-001 ~(keystone_admin)]# rpm -qa | grep redis python-redis-2.10.3-1.el7.noarch redis-3.2.8-1.el7.x86_64

longxb040 commented 6 years ago

The version of openstack is Mitaka, and the version of ceilometer is 6.1.3 . For compatibility with gnocchi 4.2.4, i changed this code to make sure that the resource id is the same as before. /usr/lib/python2.7/site-packages/gnocchi/utils.py

def ResourceUUID(value, creator):
    if isinstance(value, uuid.UUID):
        return value
    if '/' in value:
        raise ValueError("'/' is not supported in resource id")
    try:
        try:
            return uuid.UUID(value)
        except ValueError:
            if len(value) <= 255:
                if six.PY2:
                    value = value.encode('utf-8')
                return uuid.uuid5(RESOURCE_ID_NAMESPACE, value)
            raise ValueError(
                'transformable resource id >255 max allowed characters')
    except Exception as e:
        raise ValueError(e)

Is this the cause?

jd commented 6 years ago

I don't think that your change is related.

How many node are running Gnocchi? Only one?

longxb040 commented 6 years ago

There's only one gnocchi node.

jd commented 6 years ago

So just to be clear you have two problems:

  1. 2018-07-16 00:52:21,199 [146006] ERROR gnocchi.cli.metricd: Unexpected error updating the task partitioner: Unknown node0256d3984057.6.af54813b-31d6-4509-91c3-207ab38`
  2. IndexError: index 4707715725467320320 is out of bounds for axis 1 with size 22

First error would indicate something like Redis restarted and cleaned while metricd was running. Can you start fresh to see if it's gone?

Second error is hard to understand if you don't provide the data of the metric: the current content and what might be in the measure storage for this metric.

longxb040 commented 6 years ago

There are three types of resource types: instance_network_interface, instance, instance_disk. They each have the following metric names:

[root@gd1-qy-controller-001 ~(keystone_admin)]# gnocchi resource show 0d2100aa-3eaf-4893-89b2-f5775104109b
+-----------------------+--------------------------------------------------------------------------+
| Field                 | Value                                                                    |
+-----------------------+--------------------------------------------------------------------------+
| created_by_project_id | 396863cc74234b4699de27aad8c1bba9                                         |
| created_by_user_id    | 9f5551966a194b9bb0546fae57622d0b                                         |
| creator               | 9f5551966a194b9bb0546fae57622d0b:396863cc74234b4699de27aad8c1bba9        |
| ended_at              | None                                                                     |
| id                    | 0d2100aa-3eaf-4893-89b2-f5775104109b                                     |
| metrics               | cpu.delta: a0fa73ca-6015-4e4b-9f6d-12a7f34873d8                          |
|                       | cpu_util: 70e15753-9bac-4292-8481-8381602cb5ef                           |
|                       | devops.memory.usage: e20dc2b2-c316-4f34-abb3-8e7c3a47e6b2                |
|                       | devops.partinode.total_inode.:data: f1ab7bee-f8d2-4ed6-8e0f-04d9693db5ae |
|                       | devops.partinode.total_inode.root: 1b504531-d166-40e7-9826-27bd524656e4  |
|                       | devops.partinode.used_inode.:data: afcda795-db82-4f01-9415-f6f9c5971228  |
|                       | devops.partinode.used_inode.root: e0cbf9b7-dfde-427e-ba70-53f49c66a41d   |
|                       | devops.partsize.total_size.:data: 0e503868-236a-44cb-8cf2-89f8ce33dc91   |
|                       | devops.partsize.total_size.root: d91a6c26-c423-43e0-bea5-3797eef64d87    |
|                       | devops.partsize.used_size.:data: b9b5e02c-f99c-43d7-82c1-a7912d840ea0    |
|                       | devops.partsize.used_size.root: cf6ec443-232f-464a-aa24-869483073bd9     |
|                       | disk.read.bytes.rate: 341a56e4-01eb-4b92-921b-116576421779               |
|                       | disk.read.requests.rate: a330d72b-bc8d-48eb-92aa-1d0f884d70b6            |
|                       | disk.write.bytes.rate: dae48451-3c47-4304-98e8-97287322d2eb              |
|                       | disk.write.requests.rate: e1b70c8d-d92d-4606-9171-4d2fd53493b9           |
|                       | memory.resident: bacb72ee-8f76-4b96-87fb-b9bb0df8f6db                    |
|                       | memory.usage: 3eed6a5b-03cf-4aa0-ad57-9981ddb0ee56                       |
| original_resource_id  | 0d2100aa-3eaf-4893-89b2-f5775104109b                                     |
| project_id            | fdfdddd8ad37468ab6929bc1aa3803e6                                         |
| revision_end          | None                                                                     |
| revision_start        | 2018-06-28T09:59:00.786400+00:00                                         |
| started_at            | 2018-06-28T09:59:00.786367+00:00                                         |
| type                  | instance                                                                 |
| user_id               | 1a0103e7afab445cacd2802701286b9e                                         |
+-----------------------+--------------------------------------------------------------------------+

[root@gd1-qy-controller-001 ~(keystone_admin)]# gnocchi resource show 0037db6c-4d29-5fb1-80f5-e6fc8137f4aa
+-----------------------+-----------------------------------------------------------------------+
| Field                 | Value                                                                 |
+-----------------------+-----------------------------------------------------------------------+
| created_by_project_id | 396863cc74234b4699de27aad8c1bba9                                      |
| created_by_user_id    | 9f5551966a194b9bb0546fae57622d0b                                      |
| creator               | 9f5551966a194b9bb0546fae57622d0b:396863cc74234b4699de27aad8c1bba9     |
| ended_at              | None                                                                  |
| id                    | 0037db6c-4d29-5fb1-80f5-e6fc8137f4aa                                  |
| metrics               | network.incoming.bytes.rate: e24f65ef-029b-4ac7-88ff-98a44d432a5e     |
|                       | network.incoming.packets.rate: 251d7810-92ce-4316-91d3-ffcd2b67e3e0   |
|                       | network.outgoing.bytes.rate: c684977b-cc43-4f53-88b4-79994af0c961     |
|                       | network.outgoing.packets.rate: 4e5f68fa-b5e7-4e4a-8ebd-62f60954c55c   |
| original_resource_id  | instance-00000022-40724a94-cffa-4e02-a54b-9e5cc11bff82-tapb68b7976-14 |
| project_id            | 6359e1a306a64677a775d7db13750215                                      |
| revision_end          | None                                                                  |
| revision_start        | 2018-06-28T10:05:14.855223+00:00                                      |
| started_at            | 2018-06-28T10:05:14.855198+00:00                                      |
| type                  | instance_network_interface                                            |
| user_id               | 6baa87f84a2d4317b866e49df35e7da8                                      |
+-----------------------+-----------------------------------------------------------------------+

[root@gd1-qy-controller-001 ~(keystone_admin)]# gnocchi resource show 5065c6c2-fc06-5471-94c5-7da7a3eb17c0
+-----------------------+-----------------------------------------------------------------------+
| Field                 | Value                                                                 |
+-----------------------+-----------------------------------------------------------------------+
| created_by_project_id | 396863cc74234b4699de27aad8c1bba9                                      |
| created_by_user_id    | 754a340422ff4d6dbeb7e361468ee55a                                      |
| creator               | 754a340422ff4d6dbeb7e361468ee55a:396863cc74234b4699de27aad8c1bba9     |
| ended_at              | None                                                                  |
| id                    | 5065c6c2-fc06-5471-94c5-7da7a3eb17c0                                  |
| metrics               | disk.device.read.bytes.rate: e9a822d5-0f9c-4cbe-8c94-7c5e4618cd44     |
|                       | disk.device.read.requests.rate: abf47aa5-59d1-40be-b140-051c5eb3fd11  |
|                       | disk.device.write.bytes.rate: 3ea776ea-4628-47a8-b9df-8a3eab920d15    |
|                       | disk.device.write.requests.rate: 7c0be1bc-54c0-47e5-9025-0715706854f3 |
| original_resource_id  | bd386e59-e452-479f-9d1c-26611b0b454b-vda                              |
| project_id            | 6e7849b0791b40aa8df6daf71efaec09                                      |
| revision_end          | None                                                                  |
| revision_start        | 2018-06-28T06:19:21.691439+00:00                                      |
| started_at            | 2018-06-28T06:19:21.691381+00:00                                      |
| type                  | instance_disk                                                         |
| user_id               | 4f003a778cdc411f86c5aa722211851f                                      |
+-----------------------+-----------------------------------------------------------------------+

Metric data is a float type.

jd commented 6 years ago

That does not gives us the actual data of the metric. Do a measures show on the metric that fails and a dump of the measures objects from Redis itself.

jd commented 6 years ago

@chungg @sileht You're more familiar with the quantile code, if you have any idea of what could cause that.

chungg commented 6 years ago

i can try looking at this after work. i really hope i didn't f this up :(

chungg commented 6 years ago

yeah, the index is super weird. how did it get such a large index if the length is only 22... that index would mean the array is thousands of petabytes big :|

maybe if you try logging/printing self.counts and self.indexes before:

        values = (
            self._ts['values'][ordered][floor_pos] * (ceil_pos - real_pos) +
            self._ts['values'][ordered][ceil_pos] * (real_pos - floor_pos))
longxb040 commented 6 years ago

Hello, because of the large number of metric, I listed one of them. This is its metric data:

[root@gd-gz02-control-001 ~(keystone_admin)]# gnocchi measures show ef2a6b0e-398d-459b-b1a1-bd75cd733427
+---------------------------+-------------+-----------------+
| timestamp                 | granularity |           value |
+---------------------------+-------------+-----------------+
| 2018-03-12T00:00:00+00:00 |     21600.0 | 0.0944074671545 |
| 2018-03-12T00:00:00+00:00 |      1800.0 | 0.0938001045818 |
| 2018-03-12T04:00:00+00:00 |      1800.0 |  0.107636420373 |
| 2018-03-12T04:30:00+00:00 |      1800.0 | 0.0932090543916 |
| 2018-03-12T05:00:00+00:00 |      1800.0 | 0.0954996360816 |
| 2018-03-11T23:54:00+00:00 |       420.0 | 0.0938001045818 |
| 2018-03-12T04:27:00+00:00 |       420.0 |    0.1040357138 |
| 2018-03-12T04:34:00+00:00 |       420.0 |  0.100795033405 |
| 2018-03-12T04:41:00+00:00 |       420.0 | 0.0910234547819 |
| 2018-03-12T04:48:00+00:00 |       420.0 | 0.0862284984593 |
| 2018-03-12T04:55:00+00:00 |       420.0 | 0.0894921532808 |
| 2018-03-12T05:02:00+00:00 |       420.0 | 0.0954996360816 |
| 2018-03-12T00:00:00+00:00 |        60.0 | 0.0938001045818 |
| 2018-03-12T04:27:00+00:00 |        60.0 |  0.107636420373 |
| 2018-03-12T04:30:00+00:00 |        60.0 | 0.0932335940819 |
| 2018-03-12T04:33:00+00:00 |        60.0 |  0.107636420373 |
| 2018-03-12T04:34:00+00:00 |        60.0 |  0.100795033405 |
| 2018-03-12T04:38:00+00:00 |        60.0 |  0.100795033405 |
| 2018-03-12T04:41:00+00:00 |        60.0 | 0.0910234547819 |
| 2018-03-12T04:43:00+00:00 |        60.0 | 0.0910234547819 |
| 2018-03-12T04:48:00+00:00 |        60.0 | 0.0858752474672 |
| 2018-03-12T04:53:00+00:00 |        60.0 | 0.0865817494513 |
| 2018-03-12T04:55:00+00:00 |        60.0 | 0.0874896590139 |
| 2018-03-12T04:58:00+00:00 |        60.0 | 0.0874896590139 |
| 2018-03-12T05:00:00+00:00 |        60.0 | 0.0954996360816 |
| 2018-03-12T05:02:00+00:00 |        60.0 | 0.0954996360816 |
| 2018-03-12T05:03:00+00:00 |        60.0 | 0.0942641381421 |
| 2018-03-12T05:08:00+00:00 |        60.0 |  0.096735134021 |
| 2018-03-12T00:00:00+00:00 |        15.0 | 0.0938001045818 |
| 2018-03-12T04:27:00+00:00 |        15.0 |  0.107636420373 |
| 2018-03-12T04:30:00+00:00 |        15.0 | 0.0932335940819 |
| 2018-03-12T04:33:00+00:00 |        15.0 |  0.107636420373 |
| 2018-03-12T04:33:45+00:00 |        15.0 |  0.107636420373 |
| 2018-03-12T04:34:00+00:00 |        15.0 |  0.100795033405 |
| 2018-03-12T04:38:00+00:00 |        15.0 |  0.100795033405 |
| 2018-03-12T04:38:45+00:00 |        15.0 |  0.100795033405 |
| 2018-03-12T04:41:00+00:00 |        15.0 | 0.0910234547819 |
| 2018-03-12T04:43:00+00:00 |        15.0 | 0.0910234547819 |
| 2018-03-12T04:43:45+00:00 |        15.0 | 0.0910234547819 |
| 2018-03-12T04:48:00+00:00 |        15.0 | 0.0858752474672 |
| 2018-03-12T04:48:45+00:00 |        15.0 | 0.0858752474672 |
| 2018-03-12T04:53:00+00:00 |        15.0 | 0.0865817494513 |
| 2018-03-12T04:53:45+00:00 |        15.0 | 0.0865817494513 |
| 2018-03-12T04:55:00+00:00 |        15.0 | 0.0874896590139 |
| 2018-03-12T04:58:00+00:00 |        15.0 | 0.0874896590139 |
| 2018-03-12T04:58:45+00:00 |        15.0 | 0.0874896590139 |
| 2018-03-12T05:00:00+00:00 |        15.0 | 0.0954996360816 |
| 2018-03-12T05:02:00+00:00 |        15.0 | 0.0954996360816 |
| 2018-03-12T05:03:00+00:00 |        15.0 | 0.0942641381421 |
| 2018-03-12T05:03:45+00:00 |        15.0 | 0.0942641381421 |
| 2018-03-12T05:08:00+00:00 |        15.0 |  0.096735134021 |
| 2018-03-12T05:08:45+00:00 |        15.0 |  0.096735134021 |
+---------------------------+-------------+-----------------+

I try to print self.counts and self.indexes. I made the following modifications to the code, but it doesn't print the values of self.counts and self.indexes.

import daiquiri
LOG = daiquiri.getLogger(__name__)
def quantile(self, q):
        LOG.info("self.indexes: %s, self.counts: %s" % (self.indexes, self.counts))
        ordered = numpy.lexsort((self._ts['values'], self.indexes))
        min_pos = numpy.cumsum(self.counts) - self.counts
        real_pos = min_pos + (self.counts - 1) * (q / 100)
        floor_pos = numpy.floor(real_pos).astype(numpy.int, copy=False)
        ceil_pos = numpy.ceil(real_pos).astype(numpy.int, copy=False)
        values = (
            self._ts['values'][ordered][floor_pos] * (ceil_pos - real_pos) +
            self._ts['values'][ordered][ceil_pos] * (real_pos - floor_pos))
        # NOTE(gordc): above code doesn't compute proper value if pct lands on
        # exact index, it sets it to 0. we need to set it properly here
        exact_pos = numpy.equal(floor_pos, ceil_pos)
        values[exact_pos] = self._ts['values'][ordered][floor_pos][exact_pos]
        return make_timeseries(self.tstamps, values)
chungg commented 6 years ago

so it throws traceback but doesn't log anything? alternatively, you could try/except the exception, log, and re-raise the error (so it doesn't proceed to store any corrupt data)

longxb040 commented 6 years ago

Hello, chungg. I got these logs in your way.

2018-07-19 16:37:37,662 [63] ERROR    gnocchi.carbonara: self.indexes: ['2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
 '2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
 '2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
 '2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'
 '2018-07-19T16:30:00.000000000' '2018-07-19T16:30:00.000000000'], self.counts: [10]. index 4746790760253227008 is out of bounds for axis 1 with size 10
2018-07-19 16:37:37,666 [63] ERROR    gnocchi.storage: Error processing new measures
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 505, in process_new_measures
    self._compute_and_store_timeseries(metric, measures)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 580, in _compute_and_store_timeseries
    before_truncate_callback=_map_add_measures)
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 348, in set_values
    before_truncate_callback(self)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 576, in _map_add_measures
    for aggregation in agg_methods))
  File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in parallel_map
    return list(executor.map(lambda args: fn(*args), list_of_args))
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 641, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, in result
    return self.__get_result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 63, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in <lambda>
    return list(executor.map(lambda args: fn(*args), list_of_args))
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 341, in _add_measures
    grouped_serie, ap_def.granularity, aggregation_to_compute)
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 620, in from_grouped_serie
    q))
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 743, in _resample_grouped
    return agg_func(q) if agg_name == 'quantile' else agg_func()
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 201, in quantile
    raise Exception, ex
IndexError: index 4746790760253227008 is out of bounds for axis 1 with size 10
2018-07-19 16:47:56,672 [63] ERROR    gnocchi.carbonara: self.indexes: ['2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
 '2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
 '2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
 '2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'
 '2018-07-19T16:41:00.000000000' '2018-07-19T16:41:00.000000000'], self.counts: [10]. index 4746790760253227008 is out of bounds for axis 1 with size 10
2018-07-19 16:47:56,674 [63] ERROR    gnocchi.storage: Error processing new measures
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 505, in process_new_measures
    self._compute_and_store_timeseries(metric, measures)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 580, in _compute_and_store_timeseries
    before_truncate_callback=_map_add_measures)
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 348, in set_values
    before_truncate_callback(self)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 576, in _map_add_measures
    for aggregation in agg_methods))
  File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in parallel_map
    return list(executor.map(lambda args: fn(*args), list_of_args))
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 641, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, in result
    return self.__get_result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 63, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 308, in <lambda>
    return list(executor.map(lambda args: fn(*args), list_of_args))
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 341, in _add_measures
    grouped_serie, ap_def.granularity, aggregation_to_compute)
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 620, in from_grouped_serie
    q))
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 743, in _resample_grouped
    return agg_func(q) if agg_name == 'quantile' else agg_func()
  File "/usr/lib/python2.7/site-packages/gnocchi/carbonara.py", line 201, in quantile
    raise Exception, ex
IndexError: index 4746790760253227008 is out of bounds for axis 1 with size 10
chungg commented 6 years ago

hmmm... that shouldn't break anything... when i use the logged data in the code it runs fine using numpy 1.14.2.

maybe try logging the rest of the information? like ordered, floor_pos and ceil_pos?

i wonder if it's something related to threading? i hope not, but you could try disabling threading by setting parallel_operations = 1

longxb040 commented 6 years ago

I've disabled multithreading by setting parallel_operations = 1. It has been observed for a day without any more IndexError. But does it affect performance?

jd commented 6 years ago

It just means some operations won't be parallelized. With Redis that should only have a low impact.

@chungg do you have any idea what might be thread unsafe?

longxb040 commented 6 years ago

Yes, at present there is no message blocking at rabbitmq.

chungg commented 6 years ago

how frequent were you getting the IndexError previously?

@jd i have no idea. from a quick glance, it doesn't seem like any of the aggregations manipulate the input arrays.

longxb040 commented 6 years ago

@chungg The frequency is not regular, but it will certainly appear without setting parallel_operations in 4 hours. I tried to set parallel_operations = 8, the period of occurrence of IndexError is longer than without setting parallel_operations parameter. It seems that the smaller the value of parallel_operations , the longer the period is.

chungg commented 6 years ago

@longxb040 to clarify,

i do find this very strange especially the large index value it throws. i don't know what could cause it to randomly throw this.

longxb040 commented 6 years ago

@chungg Sorry, my previous description is not clear . The higher the value of parallel_operations, the lesser frequent the IndexError. After setting parallel_operations to 1, no IndexError is currently seen

chungg commented 6 years ago

i'm still wondering what is setting that index value. it would be good to know whether that index comes from:ordered, min_pos, real_pos, floor_pos or ceil_pos. it seems strange that threading would have an impact on quantile calculations.