Explore mongo vs ijod resource usage

temas commented 12 years ago

There are a few facets to this. First is isolating the core processing from any other pieces. This is done by disabling all of the Collections, and making the synclet manager immediately respond with saved data. Once this is isolated we can measure the current amount of resource usage for processing synclet results. Then switching to the ijod branch we can perform with the same data and check results. Analysis and comparison will take place here.

temas commented 12 years ago

A summation of this was given in the Singly engineering meeting this week, but let me try and TL;DR it

In investigating this I came across some core important realizations that overtook this work. Primarily, this was that we need to be better about batching our operations rather than serializing them. IJOD and master were neck and neck in terms of general performance until I began working on the batching. The write only batching was a 50-60% decrease in work time. Once I got that file write batching working I found some massive RAM usage issues and began investigating that.

Just today I took this one step further and the batching is now much more complete leading to even more substantial gains. This was further helped by using a murmurhash3 of the JSON data to help find duplicates more quickly and not reprocess them. I also moved ijod off of the native 0.6 zlib binding on to the simpler compress-buffer. This has alleviated all of the RAM usage issues. Currently the ijod branch is showing amazing performance gains and just needs further testing to be merged.

temas commented 12 years ago

https://github.com/LockerProject/Locker/pull/914

LockerProject / Locker

Explore mongo vs ijod resource usage #902