0-complexity / openvcloud

OpenvCloud
Other
2 stars 4 forks source link

redis OOM command not allowed when used memory > 'maxmemory'. #1009

Open dinosn opened 6 years ago

dinosn commented 6 years ago

We have a few cases in all environments where Redis shows an error as described above.

The case appears on the current OVC 2.2.1 and on the previews version 2.2.0 .

An error as the one below will appear on alerta.

https://alerta.aydo.com/#/alert/7fb54510-e33a-4c97-a34c-1b29efca2889

7fb54510 uk-dc-1: Info - ErrorCondition on 57604fac-22f3-d37f-7f6d-89bd1a7c6b0d could not report job in error to agentcontroller ERROR: Remote Backtrace

Traceback (most recent call last): ~ File "/opt/jumpscale7/lib/JumpScale/grid/serverbase/Daemon.py", line 231, in processRPC result = ffunction(data) ~ File "controller.py", line 661, in notifyWorkCompleted self.-setJob(job, osis=saveinosis) ~ File "controller.py", line 197, in -setJob self.redis.hset("jobs:%s"%job"gid", job"guid", jobs) ~ File "/opt/jumpscale7/lib/redis/client.py", line 1853, in hset return self.execute-command('HSET', name, key, value) ~ File "/opt/jumpscale7/lib/redis/client.py", line 565, in execute-command return self.parse-response(connection, command-name, options) ~ File "/opt/jumpscale7/lib/redis/client.py", line 577, in parse-response response = connection.read-response() ~ File "/opt/jumpscale7/lib/redis/connection.py", line 574, in read-response raise response ~ ResponseError: OOM command not allowed when used memory > 'maxmemory'.

================

Client BackTrace

File "worker.py", line 276, in worker.run() File "worker.py", line 205, in run self.notifyWorkCompleted(job) File "worker.py", line 237, in notifyWorkCompleted reportJob() File "worker.py", line 226, in reportJob acclient.notifyWorkCompleted(job.--dict--) File "", line 9, in method File "/opt/jumpscale7/lib/JumpScale/grid/serverbase/DaemonClient.py", line 282, in sendcmd return self.sendMsgOverCMDChannel(cmd, args, sendformat, returnformat, category=category,transporttimeout=transporttimeout) File "/opt/jumpscale7/lib/JumpScale/grid/serverbase/DaemonClient.py", line 196, in sendMsgOverCMDChannel raise RemoteException("Cannot execute cmd:%s/%s on server:'%s:%s' error:'%s' ((ECOID:%s))" %(category,cmd,ecodict"gid",ecodict"nid",ecodict"errormessage",ecodict"guid"), ecodict)

type/level: UNKNOWN/2 ERROR IN RPC CALL notifyWorkCompleted: ResponseError: OOM command not allowed when used memory > 'maxmemory'.. (Session:{u'roles': u'node', u'storagenode', u'storagedriver', u'storagemaster', u'encrkey': u'', u'nid': 21, u'start': 1511817177, u'netinfo': {u'ip': [u'127.0.0.1', u'mac': u'00:00:00:00:00:00', u'cidr': u'8', u'name': u'lo', u'mtu': 65536}, {u'ip': u'10.16.0.63', u'mac': u'a8:1e:84:96:45:5a', u'cidr': u'24', u'name': u'eno1', u'mtu': 1500}, {u'ip': , u'mac': u'a8:1e:84:96:45:5b', u'cidr': , u'name': u'eno2', u'mtu': 1500}, {u'ip': , u'mac': u'4a:2a:41:8c:d1:aa', u'cidr': , u'name': u'ovs-system', u'mtu': 1500}, {u'ip': u'10.16.1.63', u'mac': u'ec:0d:9a:1c:03:a0', u'cidr': u'24', u'name': u'backplane1', u'mtu': 2000}, {u'ip': , u'mac': u'ec:0d:9a:1c:03:a0', u'cidr': , u'name': u'enp4s0', u'mtu': 2000}, {u'ip': , u'mac': u'ec:0d:9a:1c:03:a1', u'cidr': , u'name': u'enp4s0d1', u'mtu': 2000}, {u'ip': u'10.16.2.63', u'mac': u'80:22:c0:ff:ee:63', u'cidr': u'24', u'name': u'enp4s0f1', u'mtu': 9000}, {u'ip': , u'mac': u'2a:a2:64:07:1b:1c', u'cidr': , u'name': u'enp4s0f1d1', u'mtu': 1500}], u'gid': 888, u'passwd': u'****', u'user': u'', u'organization': u'myorg', u'id': u'888-21-0-72c1db8a-4639-41f0-b78c-a630ad45832c'}) Data:{u'job': {u'timeStop': 1511892964, u'result': u'openvstorage+tcp://10.16.2.63:26203/vm-670/cloud-init-vm-670@94a62f62-fa51-46e1-93df-d36f917b9133', u'errorreport': False, u'guid': u'6be9852e4a374a65bb12c3108b86f569', u'id': 2846, u'category': u'greenitglobe', u'timeStart': 1511892938.855335, u'log': True, u'timeCreate': 1511892938, u'state': u'OK', u'internal': False, u'gid': 888, u'jscriptid': 67, u'parent': None, u'args': {u'userdata': {}, u'type': u'Windows', u'name': u'vm-670', u'metadata': {u'admin-pass': u'6KSycg1gP', u'hostname': u'vm-670'}}, u'nid': 21, u'achost': u'64.253.35.36', u'sessionid': u'888-1-0-15bf7053-2055-44d8-942f-4c811642bb2e', u'wait': True, u'-meta': u'system', u'job', 1, u'roles': u'storagedriver', u'cmd': u'createmetaiso', u'queue': u'', u'timeout': 600, u'resultcode': 0, u'-ckey': u''}}

On the same topic we are starting to see redis reaching memory usage limits allocated to it.

screen shot 2017-11-28 at 20 09 57

Is it possibly the time to increase redis memory allocation values ?

Issues reported also on gogs at: https://docs.greenitglobe.com/gig/org_support/issues/469

dinosn commented 6 years ago

The issue also has as result the following on the panel,

screen shot 2017-11-28 at 20 32 36

Issue reported at gogs: https://docs.greenitglobe.com/gig/proj_gig_uk/issues/59

dinosn commented 6 years ago

More info at https://docs.greenitglobe.com/gig/org_support/issues/473

ashraffouda commented 6 years ago

@grimpy @dinosn the memory usage before this happens is high compared to the usage after that

rdb --command memory ~/Documents/dump-before.rdb --bytes 1000 

screenshot from 2017-12-05 10-10-51

rdb --command memory ~/Documents/dump-after.rdb --bytes 1000           

screenshot from 2017-12-05 10-10-57

this is the size of the eco:objects key -rw-r--r--. 1 afouda afouda 17M Dec 5 09:45 before.json -rw-r--r--. 1 afouda afouda 2.3M Dec 5 09:44 after.json

from what I see is that the things that took most of the me are eco:objects and queue:stats:min also some thing I want to know " Was Redis crashed after OOM happened or not ? cause I see less keys after the error and crashed my cause this problem we have a limit of 100 m for max mem should we increase this limit

ashraffouda commented 6 years ago

Update found this in redis docs https://redis.io/topics/admin

Make sure to setup some swap in your system (we suggest as much as swap as memory).
If Linux does not have swap and your Redis instance accidentally consumes too much memory, 
either Redis will crash for out of memory or the Linux kernel OOM killer will kill the Redis process.
FastGeert commented 6 years ago

The problem is that eco's have become much larger than before. This should be kept like this, because it makes the eco's a lot better.

Anyway, it seems that this triggers the redis to fill up, so we need to solve this. AFAIK we put the eco's in redis for deduplication. So in stead of putting the ecos completely in redis, we should only put the hash of a calculated value based on the parameters we use for deduplication in redis, and only store the complete object in mongodb

ashraffouda commented 6 years ago

found out that we don't need the whole object but we need specific data which is very tiny(timestamps) to check against, So I will drop the object and keep only required data for doing dedup, hashing is not needed

FastGeert commented 6 years ago

Ok, great.

dinosn commented 6 years ago

Reopening this we still have warnings poping up. Please change the redis max memory allocation to 1gb during the installation.

Currently we have to set it manually for all the environments. Issue https://docs.greenitglobe.com/gig/proj_gig_switzerland/issues/37