ElliotKillick / Mido

The Secure Microsoft Windows Downloader
https://elliotonsecurity.com
MIT License
552 stars 25 forks source link

win7x64-ultimate iso redownloaded every time the script is run with 'all' argument #4

Open sysfu opened 1 year ago

sysfu commented 1 year ago

So far as I can tell mido already successfully downloaded the win7x64-ultimate.iso file on the first run. I'm re-running mido to complete a few remaining partial dowloads of win2022-eval.ios and win11x64-enterprise-eval.isos.

Is the sha256 hash of each ISO checked for before making the decision to redownload a file or only afterwards?

Current hash of win7 iso

sha256sum win7x64-ultimate.iso dec04cbd352b453e437b2fe9614b67f28f7c0b550d8351827bc1e9ef3f601389 win7x64-ultimate.iso

win7x64-ultimate.iso sha256 hash as found in the mido.sh script

dec04cbd352b453e437b2fe9614b67f28f7c0b550d8351827bc1e9ef3f601389

Here's a current directory listing:

total 52638916
-rwxr-xr-x 1 user user      38937 Aug  8 12:10 mido.sh
-rw-r--r-- 1 user user 5550497792 Aug  9 03:41 win10x64-enterprise-eval.iso
-rw-r--r-- 1 user user 4898582528 Aug  8 21:10 win10x64-enterprise-ltsc-eval.iso
-rw-r--r-- 1 user user 5088602112 Aug  8 20:38 win11x64-enterprise-eval.iso
-rw-r--r-- 1 user user 1001123840 Aug  9 03:47 win11x64-enterprise-eval.iso.PART
-rw-r--r-- 1 user user 3166840832 Aug  8 21:35 win2008r2.iso
-rw-r--r-- 1 user user 4542291968 Aug  8 22:17 win2012r2-eval.iso
-rw-r--r-- 1 user user 6972221440 Aug  8 23:10 win2016-eval.iso
-rw-r--r-- 1 user user 5652088832 Aug  8 23:43 win2019-eval.iso
-rw-r--r-- 1 user user 4778459128 Aug  9 00:12 win2022-eval.iso.PART
-rw-r--r-- 1 user user 5876357120 Aug  8 18:37 win7x64-ultimate.iso
-rw-r--r-- 1 user user 2413605000 Aug  9 09:05 win7x64-ultimate.iso.PART
-rw-r--r-- 1 user user 3961473024 Aug  9 03:09 win81x64-enterprise-eval.iso
sysfu commented 1 year ago

As the script progresses, it looks like it's redownloading the win10x64-enterprise-eval.iso as well, so I assume it's going to re-download every other iso that was completely downloaded on the prior run.

ElliotKillick commented 1 year ago

If you specify the all argument (or any argument when the requested media has already been fully downloaded) then Mido will attempt to download it again as you requested. Mido has no way to know if the already successfully downloaded media has been updated on the server since the last download so this behavior seems reasonable to me. Microsoft doesn't provide any mechanism (AFAIK) to check if the media has been updated in advance of downloading.

Perhaps it would be possible to check the Last-Modified or ETag HTTP response headers to see if the file has changed since last successful download (do MS servers send these response headers?) and only if it has then download the new version. However, this means we would then have to write that data somewhere making Mido less portable. Also, I'm not sure if curl has support for this or not (if it has some built-in functionality for this then perhaps we could easily just throw it in as an option to the curl command in the scurl_file function).

If I understand your ask correctly, I'm not sure implementing a feature to make Mido work how you would expect it to would be worth the complexity it introduces.

sysfu commented 1 year ago

Mido has no way to know if the already successfully downloaded media has been updated on the server since the last download so this behavior seems reasonable to me.

mido.sh script already contains the sha256 hash values for all iso files. Would it be a headache to test for the existence of each iso, compare the hash, and only download if either of the first two tests failed?

From my lowly end user standpoint it's counter intuitive and a waste of time and bandwidth to re-download existing and known good (and large!) iso files every time the command is run.

Perhaps an additional argument such as allfresh could be created that bypasses the above proposed file checks and forces re-downloading of all iso files.

I'll contribute $50 towards this feature.

ElliotKillick commented 1 year ago

The problem with relying on the SHA-256 hashes is that those values are hardcoded into the script itself. They need to be manually updated when, for example, Microsoft issues a new release for win10x64. They therefore cannot be used as markers for whether or not the most up-to-date release of an ISO has been downloaded.

When originally writing Mido, I considered whether I should include the all option in case someone mistakes it for being more "intelligent" than it really is as you did. I'm leaning towards simply removing the all option.

ElliotKillick commented 1 year ago

For saving bandwidth, perhaps the desired functionality could be implemented like this: https://www.cyberciti.biz/faq/linux-unix-curl-if-modified-since-command-linux-example/

We would have to verify that MS servers will cooperate with the If-Modified-Since header though. We could also the last modification time of each already downloaded ISO to avoid non-portable writing of that timestamp to a separate file. One can also get the HTTP status returned by curl using -w "%{http_code}" (to detect 304 Not Modified).

I don't have any plans of doing this, but it's up for grabs if someone wants! Feel free to verify any requirements @sysfu.

Update: On second thought, giving away an exact timestamp of the last download could lead to unique identification of users. Instead, checking the Last-Modified response header with a --head request if a downloaded ISO already exists then doing the timestamp comparison on our side before deciding whether to request an updated file download would be much more privacy friendly.