QutEcoacoustics / baw-workers

Workers that can process various long-running or intensive tasks.
Apache License 2.0
3 stars 0 forks source link

Optimization: Invariant payloads #31

Closed atruskie closed 8 years ago

atruskie commented 8 years ago

As detailed in: https://github.com/QutBioacoustics/baw-private/issues/110#issuecomment-152883850

Enqueuing a large batch of jobs (330K) creates as many entries in Redis. Currently, the entire job payload is duplicated even though 90% of it does not change. In particular, verbose configuration files (like in the Towsey.Acoustic.yml case) take up 3715/4239 chars, 88% of the payload. For 330K - 1 jobs, 3715 bytes are redundant; that is 1.23GB of unnecessary RAM use on a redis instance.

Investigate methods of storing the bulk of the payload (especially the config strings) only once. I'm thinking a redis hash string value where the key is the SHA of the config, and the value is the config {{namespace}}:{{object-type}}:{{id}} and the value is the JSON encoded rest of the payload. Jobs then do two redis operations, one for the payload (done by resque) and one for the config rest of they payload (the invariant payload). An example key may look like: baw-workers:analysis_job:123 or baw-workers:analysis_job:system:index_genration.

Important requirements:

{
  "uuid": "7d78905e-5ea9-4a95-b568-0f14c2f37e4c",
  "id": 1234,
  "payload-base": "baw-workers:analysis_job:123"
}
atruskie commented 8 years ago

Okay further investigation into this issue is revealing some super interesting information.

Redis is currently using 4GB of RAM when idle.

Using https://github.com/gamenet/redis-memory-analyzer and https://redisdesktop.com/ I've worked out the following:

New issues created:

All of these effects are issues with storage after jobs have run, the partial payload idea may be still useful because enqueuing a massive amount of jobs will still chew up memory. I plan to fix those bugs first, profile performance, and then complete partial payloads if necessary.