Closed GoogleCodeExporter closed 8 years ago
Original comment by daygloje...@gmail.com
on 3 Feb 2011 at 11:15
I forgot to say which build this was discovered in...
Version: 0.7
Build: r1004
Original comment by daygloje...@gmail.com
on 4 Feb 2011 at 7:49
What web server are you running? Are any normal modules turned off?
If your web server refuses the byte range command you will see this behaviour.
Munki doesn't currently catch exit code 33 - it is a special case that prevents
use of the existing partial download.
"curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume."
I'm not seeing this fault against my webserver (Apache2 on Linux) or against
Akamai when testing with the underlying curl commands. I also don't see this
behaviour when doing functional testing with munki (kill curl part way through
a managedsoftwareupdate run). 0.7.0 Build 1005.
In looking at this I note a different bug which I will think about ... resuming
partials does not currently use the etag to ensure it is resuming the same
version of the file. The etag is currently only used to check that previously
fully-downloaded files are still fresh.
Original comment by rrmiddleton@gmail.com
on 5 Feb 2011 at 11:46
It's a stock Mac OS X Server (Apache/2.2.15) but we are utilizing munki server
(https://github.com/jraine/munkiserver) to host the repo. This is a Rails app
using Passenger/Apache, but byte range requests should be handled by Apache, no?
I am fairly certain we have not disabled any standard modules.
Still, this may be specific to our environment.
Original comment by daygloje...@gmail.com
on 6 Feb 2011 at 2:23
After a bit of reading, I think this may be a defect in munki server or a
limitation of Passenger.
I am going to change the status of this issue to Low while we review this.
Thanks.
Original comment by daygloje...@gmail.com
on 6 Feb 2011 at 2:33
It will be a defect with munki-server. Apache & Passenger are just acting as
proxy to a stream in this case. It is up to the dynamic code to handle all
headers when acting as a stream. While this is a bug in munki too, munki really
does need the server to support byte-range requests if downloading large
payload - particularly if serving to laptops that might change wifi access
points and thus be trashing tcp connections as they move, or to remote clients
on slower links.
Apache can only do byte-range if it is the program accessing the file from disk
(and thus knows everything about the file, can start reading at any point, and
can generate an eTag that it trusts).
Note - munki does not expect byte-range to be supported for manifests or
catalogs (not resume=true when called in updatecheck). Thus these are easy to
generate with dynamic code, no need to worry about implementing partial
requests. I expect in most cases pkgs can (and should) happily be served by an
appropriately protected apache directory.
Opinion on static content / dynamic web-apps / auth:
I currently do auth for pkgs with client certificates (thus at the Apache
auth-wrapper level, not application level). In other applications I write auth
at the mod_perl or mod_python level (PerlAuthenHandler, PerlAuthzHandler), this
allows custom auth while allowing Apache to still do all the other fancy HTTP
stuff for static content. It also substantially improves security for dynamic
apps - the underlying application code is not able to be hit by
non-authenticated clients, they can only hit the authentication wrapper (tiny
and auditable).
Original comment by rrmiddleton@gmail.com
on 6 Feb 2011 at 3:31
Great summary, Rob. Thanks.
Do you think it would be advisable to add some contingency code that would
remove the partial and attempt a re-download in a case where it could not
resume?
Original comment by daygloje...@gmail.com
on 6 Feb 2011 at 4:38
Brian: see if r1026 addresses this issue for your server. Expected behavior is
that it attempts to resume, catches the 33 error code from curl and removes the
partial download. The next download attempt should then succeed.
So you'll see one failed attempt to resume the download. The next attempt
should start over from scratch.
Original comment by gregnea...@mac.com
on 8 Feb 2011 at 6:33
I think that ought to do it. Thanks, Greg.
Behavior is now: attempt a resume, on error fail. On subsequent run, attempt a
new download.
One problem: I noticed that r1026 was not returning its build number...
[11:02:33]van-daphne# managedsoftwareupdate -V
0.7.0 Build svn
I wanted to begin deploying the 0.7r1004 build today. Any reason I should not
sub the r1026 build in its place?
Original comment by daygloje...@gmail.com
on 8 Feb 2011 at 7:06
Greg, I'd be tempted to log a specific warning if a byte-range request fails
(error 33) before throwing the exception.
Perhaps munki should expect these minimum capabilities from a web server:
- resume / byte-range requests (for pkgs, but not necessarily for manifests or
catalogs)
- etag returned in all cases (full or partial download)
If the web server fails on one of these munki should log a warning. These cause
less ideal behaviour in munki, it could be good to have a 'don't blame munki -
your web server is broken' warning in the munki logs.
Additionally - if no objection, I might implement saving the etag of a partial,
and then only resuming a partial if the etag matches. Currently munki will
resume a partial with no checking of whether the file on the server has
changed. The SHA checksum will of course stop any major issue, but it would be
good to not resume a download when we know it will be corrupt.
Original comment by rrmiddleton@gmail.com
on 8 Feb 2011 at 7:25
These sound like reasonable changes. I'm currently in panic mode, getting
ready for a trip to Oslo, Norway. Would you like to take a crack at
implementing these?
I would not want munki to log either of these warnings more than once a
session; it would be annoying to see dozens of "WARNING: HTTP server foo does
not support Etags" in the logs.
Original comment by gregnea...@mac.com
on 8 Feb 2011 at 7:46
Attached patch enhances a few things:
- if an etag is present in a partial only resume the download if it matches
- if resume if not possible immediately retry a full download
- log if eTag is missing for downloads we expect to be resumable (ie: pkgs)
- log if range / resume fails.
- delete the partial download in a few other error cases
- actually log the curl exit code / error string on all download failures.
- don't crash munki if the curl error string is not in expected form
Code is tested against apache - with various forms of failures causing
partials, with etags on/off and range capability on/off. All working without
crashes or loops.
TODO comments are left in the code where I'm not certain of the best behaviour.
I would appreciate comments before committing (without quite so many code
comments). No hurry.
Slight defect - the etag xattr could probably be written as soon as we finish
with the headers (don't wait for curl to exit). However I'm not certain the
exact moment curl creates the output file.
The code was already pretty solid for everything but the various possibilities
that might cause curl to fail repeatedly with partials & servers that don't
deal with the HTTP/1.1 features we want.
Original comment by rrmiddleton@gmail.com
on 12 Feb 2011 at 3:58
Attachments:
Hey, Rob -- just now got a chance to look at this. I agree in concept with the
changes, but can't get the patch to apply to the current version of
updatecheck.py -- parts of the patch apply, and others are rejected. Can you
re-do this against the current version of updatecheck.py?
More specific comments;
I'm all for providing more info when a download fails. (Lines 119-124 of the
patch)
I like the conversion of a resume to a full download if the server doesn't
appear to support byte range requests. (Lines 80-98 of patch)
...and you've neatly solved the issue of repeat logging of Etag/Byte-range
request errors.
-Greg
Original comment by gregnea...@mac.com
on 25 Feb 2011 at 6:36
Hi Greg, hope Europe treated you well.
Patch against 1047 attached. Looks like some end of line whitespace changed
recently, some spurious patch lines.
Note I am now uncertain in adding support for etag checking on resuming. The
patch provides for deleting the partial rather than resuming if the etag does
not match.
The changes discussed in issue 60 mean etag may not be so useful. If a partial
download has happened from one http server (or a file:// server) should we be
able to resume from another source (future code)? If so, the only thing we
expect to be the same between copies is the modified time and size (and the sha
checksum which we don't know until we complete the download).
Perhaps the original behaviour is simplest here -- if there is a partial
download check nothing, just try to keep downloading. Fancy tricks won't hold
so well in the face of ad-hoc clustered servers for pkgs (either multiple
http(s) or file).
Original comment by rrmiddleton@gmail.com
on 25 Feb 2011 at 10:15
Attachments:
Patch committed in r1051. We can revisit the resume vs etag as we integrate the
file:// stuff.
Original comment by gregnea...@mac.com
on 2 Mar 2011 at 7:21
Brian - is this issue fixed?
Original comment by gregnea...@mac.com
on 22 Apr 2011 at 3:33
No reply from Brian, but believe this is fixed.
Original comment by gregnea...@mac.com
on 12 May 2011 at 5:12
Original issue reported on code.google.com by
daygloje...@gmail.com
on 3 Feb 2011 at 11:13