PDX-Capstone-Team-C / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
http://scrapy.org
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

XDelta Encoding / Buffer Size #16

Open mjsiegfried opened 8 years ago

mjsiegfried commented 8 years ago

Come with a way to come up with a buffer size for each case

sgarciapdx commented 8 years ago

Easy solution: pack up the length of the serialized version of the target response with the delta as a tuple, something like:

# on store
target_as_string = self._serialize(target)
# self._encode_response(...) returns a tuple
(delta, original_target_length) = self._encode_response(target_as_string, source_as_string)
# do the storage work

# on retrieve
# get the payload from the db, then send it into the decode function
# use original_target_length as the buffer size inside _decode_response
restored_target = self._decode_response(payload, source_as_string)

We'll essentially store an integer with each delta, but we'll save the cost of having to compute the buffer size each time. I think it's a fair tradeoff.