TTL restriction in cache filter

pczarn commented 11 years ago

Frankly, it shouldn't work like this. TTL works per record, whereas cache retrieves lists of records.

MaddieM4 commented 11 years ago

The original code is kinda shitty, yeah. More of a proof-of-concept than anything. But the new code misses some of the design purpose of that original code, and I'm not sure what the "correct" solution is.

Our intention

To, if possible, prevent a potentially costly call to self.source(). The idea being, you can stick this on the end of expensive or latent chains.

Original version

Cache the entire result set indefinitely, regardless of TTL (or for that matter, qtype). Definitely wrong.

New version

Cache the entire result set indefinitely, but only return things out of that result set that are not TTL-expired. This means it will eventually return an empty list, because we still have a set of expired things cached. And you can imagine that this gets interesting when some records expire, and some don't.

Possible solution

Cache the entire list according to the smallest TTL; discard cached record list entirely and retrieve from self.source() if any record is expired. Also, be more intelligent about cache keys, so that we're not conflating different qtype and qclass values. This is probably just a matter of using (name, qtype, qclass) tuples as cache dict keys.

Noteworthy limitation: if a record list is cached, new records will not show up until at least one old record has expired. So we may want to add an optional param for maximum list-TTL.

I think this makes sense and covers the necessary bases, but I'd love your take on it, to make sure I'm not missing anything.

pczarn commented 11 years ago

The solution you propose is a good compromise. It could get better when we take qtype into further consideration. For example, cache the list according to the smallest TTL among records where rtype=qtype. Also, fetch new records if there are no records of qtype.

MaddieM4 commented 11 years ago

The problem with that is the (seemingly very reasonable!) assumption that if we request a type of record, all the returned records will be of that type. There are a surprising number of real-life cases where that just isn't true. While each request with the same (question, rtype, rclass) tuple should return basically the same list of result records, there is a very high chance that some or all of those records do not actually match the rtype.

The second thing, I would think would be automatically handled by using the (domain, rtype, rclass) tuple as cache keys. So if you have a cached list for ("somedomain.com", "A", "IN"), and you get a request for ("somedomain.com", "AAAA", "IN"), we create, store, and return a new cached list from self.source(), based on that request.

pczarn commented 11 years ago

Of course, sometimes there's not even a single record that matches the qtype. We could decrease list-TTL for these cases, because we could be asked for a record that's not there yet.

I understand how the second thing works.

MaddieM4 commented 11 years ago

I'm just saying that by design and convention, the results of a query are arbitrary-but-consistent, and may intentionally never include any records of the requested qtype (dig -t A uppit.us). It isn't a matter of "no relevant records yet", just a matter of "results are arbitrary", so our design shouldn't assume that the correct result set will or should contain records with the requested qtype.

I'd much rather that the design work consistently for any result set, than try to special-case according to rules that no real-world servers are following.

pczarn commented 10 years ago

I forgot to get these records twice in tests. Sorry it took so long.

I learned relatively much doing stuff in this project.

PS. "The original code is kinda shitty, yeah." I was referring to my own code.

MaddieM4 commented 10 years ago

No worries. I've been busy as heck myself as well. I'm going to try to look through the actual diffs soon (tomorrow maybe), but in the meantime, do you mind summarizing the changes since the last exchange of comments?

pczarn commented 10 years ago

Packed request question is used as cache keys. Cache is invalidated when any record has expired (datetime.now() >= r.ttl).

To test this comparison, we request records ([expired], [expired, record1], [record3]). Since expired.rttl=0, we get ([], [], [record3]) after clearing out the source in the meanwhile.

MaddieM4 commented 10 years ago

Just went over the diffs, this looks beautiful and the tests pass, so I'm merging. Awesome job, @pczarn!

MaddieM4 / pymads