chop-dbhi / avocado

Metadata APIs for Django

http://avocado.harvest.io

Other

41 stars 10 forks source link

Refactor formatter to reduce overhead #306

Closed bruth closed 9 years ago

bruth commented 9 years ago

A call to formatter(...) was doing a lot of work for every row being processed. The primary slow spots are:

Format methods being dynamically accessed via getattr() on every loop
Unpacking keyword arguments (**kwargs) in the method call
Building and moving OrderedDict values
Logging a message when a value failed be coerced

To remediate these issues, targeted format methods are now processed at initialization rather that process time. This requires the list of formats to be passed into the constructor and not in the call method.

Keyword arguments passed in during a call are no longer unpacked in the method call, but are passed as a keyword argument itself, kwargs.

Use of OrderedDict has been removed altogether. These structures take up a lot of memory and have proven to add little value during processing. This required adding a method for getting the header/field data for the values being processed. This is also a desirable change for addressing

99 and #255.

Logging has been addressed by only doing while in DEBUG mode. Since the exception is skipped, there is no point in logging the errors while in production.

The exporters have been updated to accommodate this changes. In addition, new read methods have been implemented for performance testing including cached and threaded versions. The previous read method have been renamed to manual_read since it performs manual limit and offsetting and distinct checking.

To accommodate the new read method, the write on all exporters have been changed to not call read itself. This change requires the passed iterable to already contain the formatted rows.

Note: This refactor requires backwards incompatible changes to get the performance gains.

Signed-off-by: Byron Ruth b@devel.io

coveralls commented 9 years ago

Coverage increased (+0.05%) to 81.42% when pulling d79d638a6523dd51abcbc74751e5add2c49fdcad on refactor-export into 4d9dc30ee2914227cc4a856f9ab8390a011bae48 on master.

coveralls commented 9 years ago

Coverage increased (+0.14%) to 81.53% when pulling cfbfe1555cccdf684faefedd533d8919b542c2b1 on refactor-export into d43ccc940cb932d0284c35447abab67fb09d68a9 on master.

coveralls commented 9 years ago

Coverage increased (+0.14%) to 81.53% when pulling cfbfe1555cccdf684faefedd533d8919b542c2b1 on refactor-export into d43ccc940cb932d0284c35447abab67fb09d68a9 on master.

coveralls commented 9 years ago

Coverage increased (+0.14%) to 81.53% when pulling cfbfe1555cccdf684faefedd533d8919b542c2b1 on refactor-export into d43ccc940cb932d0284c35447abab67fb09d68a9 on master.

naegelyd commented 9 years ago

Use of OrderedDict has been removed altogether. These structures take up a lot of memory and have proven to add little value during processing. This required adding a method for getting the header/field data for the values being processed.

I'm wondering whether you meant to say that OrderedDict was removed from a specific area of the code because it doesn't seem to be removed altogether. See the top 75 lines of avocado/formatters.py as an example of it still in use. Was this just meant to mean it was removed from the formatter call? Just looking for clarification of the description here, not really any change to the code.

coveralls commented 9 years ago

Coverage increased (+0.1%) to 81.48% when pulling 0a09b59d23e817f80d3ae3a001abfd9a2588c0e2 on refactor-export into 9e28c3d7c147dab6275c47207f871dd7d18459e6 on master.

naegelyd commented 9 years ago

@bruth Looked over the newest changes. Once the build is passing, I'd say this is ready to merge.

bruth commented 9 years ago

Thanks. I re-ran the failed build and it passed this time. I've noticed it can sometimes fail on a cache miss with memcache. Since it's sporadic, I have not taken the time to figure out of why that happens.