Closed atruskie closed 8 years ago
Okay further investigation into this issue is revealing some super interesting information.
Redis is currently using 4GB of RAM when idle.
Using https://github.com/gamenet/redis-memory-analyzer and https://redisdesktop.com/ I've worked out the following:
-1
which is why they do not get deletedfailed
queue (it is a LIST) for 7000 failed jobs uses 111MB of RAM (stores full payload, and excpetion stack trace)failed
queue itself is not a problemNew issues created:
All of these effects are issues with storage after jobs have run, the partial payload idea may be still useful because enqueuing a massive amount of jobs will still chew up memory. I plan to fix those bugs first, profile performance, and then complete partial payloads if necessary.
As detailed in: https://github.com/QutBioacoustics/baw-private/issues/110#issuecomment-152883850
Enqueuing a large batch of jobs (330K) creates as many entries in Redis. Currently, the entire job payload is duplicated even though 90% of it does not change. In particular, verbose configuration files (like in the
Towsey.Acoustic.yml
case) take up 3715/4239 chars, 88% of the payload. For 330K - 1 jobs, 3715 bytes are redundant; that is 1.23GB of unnecessary RAM use on a redis instance.Investigate methods of storing the bulk of the payload (especially the config strings) only once. I'm thinking a redis
hashstring value where the key isthe SHA of the config, and the value is the config{{namespace}}:{{object-type}}:{{id}}
and the value is the JSON encoded rest of the payload. Jobs then do two redis operations, one for the payload (done by resque) and one for theconfigrest of they payload (the invariant payload). An example key may look like:baw-workers:analysis_job:123
orbaw-workers:analysis_job:system:index_genration
.Important requirements:
payload-base
, will contain the absolute redis key to the rest of the payload. E.g.