Closed dmchurch closed 8 years ago
Thanks for the PR, makes a lot of sense. I'll try to review the code within a day or two!
Apologies for the massive delay. This looks good to me, and I have a couple of minor suggestions. Would you be willing to make a few changes? Otherwise I can merge and make the changes myself.
Sure, I can make some changes! What did you have in mind?
I've also been doing some other work on my local repo, as I've been integrating it into some professional work I've been doing, and I'd be happy to get that sorted for upstream too once this is in.
More contributions very much appreciated
FileSource
and HTTPSource
use *args
in addition to **kwargs
in __init__
?fetch
logic which might be wrong, depending on what we're going for
fetch_needed(start_sector + fetch_sectors - need_start)
, but when fetch_sectors > n_sectors
, we haven't checked if the sectors we're downloading to satisfy self.min_fetch
are already cached, which means we could download data that's already cachedself.min_fetch
. I'm not sure if this is wrong or not.*args
in there is because my general feeling around positional vs. keyword arguments in Python is that positional should only be used for things which are intrinsic to the behavior of the function/class (like the pathname/URL), and anything shaped like an "optional tweak" should be keyword-only, otherwise it makes the code less readable at the calling location. If this were a Python 3 library, I'd have put a *
after the self
argument to Source.__init__
to enforce it there, as well. If you'd prefer for people to be able to call FileSource
and HTTPSource
with cache_content
and min_fetch
as positional arguments, though, I can add *args
.min_fetch
value). It seemed to me like continuing the sector search beyond the end of the seek range to ensure that we never refetch sectors would add needless complexity to the code, especially since we'd have to extend the xrange of the sector
for loop. It's a pretty rare edge case, anyway -- it only occurs when the initial part of this seek was satisfied by a prefetch and there is another fetched area very shortly past the end of this seek, and that only happens when doing out-of-order fetches. If you think it's worth it, though, I can add a check to ensure we don't overfetch on this.OK, that all makes sense to me.
Thanks again for this!
A few improvements to the fetch behavior, to minimize number of fetches (causing seek latency for local files, or excess HTTP header traffic) and to minimize memory usage for the ISO object by not caching backing sectors for file content.