Progeny42 / refrapt

Tool to create local Debian mirrors using Python
GNU General Public License v3.0
34 stars 5 forks source link
debian mirroring

Refrapt

Build Release Status Issues Downloads


What is Refrapt?

Refrapt (Refract Apt) is originally a clone of apt-mirror, rewritten in Python, and adding a handful of improvements and fixes as identified by Open Issues and Pull Requests.

I wanted a simple way of being able to clone Debian mirrors to create my own local repository. apt-mirror is what I had originally used, until it stopped working.

Refrapt was developed on Windows, but intended for use on a Linux machine.

Features


If you've used apt-mirror, this should be familiar. However, there are a handful of new features available:

Getting Started


You will require Python 3.9 in order to install and run refrapt.

To install Refrapt, run the following pip command:

python3.9 -m pip install refrapt

The first time Refrapt is run, a default configuration file will be installed at ~/refrapt/refrapt.conf. To specify a custom location, issue the following command:

refrapt --conf "/path/to/your/config/file/refrapt.conf"

Edit the configuration file by adding each of the Repositories you wish to mirror. Examples are provided in the file.

For help with commands, run refrapt --help.

Command Line Options


--test - Runs the application as normal, but does not perform the main download of files to the local mirror, and does not clean any files identified as no longer being required. Use this option to determine how large a download is going to be, and / or how many old files can be removed.

--clean - Only perform cleaning of files no longer required. No downloads are performed. Can be used in conjunction with the --test option to identify the size of files that can be cleaned, without actually removing them. Use this option if your Repositories are not set to clean during the mirror process, or cleaning is globally disabled with disableClean in the configuration file. In conjunction with the disableClean option in the configuration file, Refrapt could clean unecessary files on a schedule such as in a cronjob.

Feature Explanation


Test Mode

Test mode can be set either via the configuration file, or passed as an argument to Refrapt using --test. Note that passing Test mode as an argument will allow you see if the parsed arguments from the Settings file. Test mode still downloads the Index files that allow Refrapt to gather all the Packages and Sources that you have defined in the configuration file, but it will not attempt the main download. This is a useful feature if you wish to see how big a download will be before comitting to it.

Why this project does not use Wget --continue or --no-if-modified-since for dealing with partial downloads

Partial downloads are a problem and there is not a simple way with Wget to resume a download without the possibility of either corrupting the file, or causing excess downloads to occur.

Refrapt uses the --timestamping (-N) option with Wget, which checks with the server the modified time of the file, and compares it with the local copy. If the timestamps match, regardless of whether the download completed, Wget will ignore the file with a 304 Not Modified. Under normal circumstances, this prevents redownloading files which you already have, saving on time and bandwidth. In the event a download was interrupted, a subsequent attempt will not succeed, as Wget will ignore the file until the file changes on the server.

Now, you could use Wget --continue, but it has problems. The --continue option compares the length of the local copy with the length of the requested file (if that option is supported by the server), and if they differ, Wget will request the remainder of the file, calculated by "(length(remote) - length(local))" in bytes. Now, if you were only downloading a log file, or an mp3, the file would likely only get longer, so appending the remaining bytes to the local copy would succeed. However, when dealing with the files in a mirror, if the contents have shifted at all, appending the remaining bytes will mean that while the sizes would match, the file is now corrupt. This will leave a corrupt version of the file in your local mirror, which will not be updated until a new update is released to the server.

Further, you could use the --no-if-modified-since option, which causes Wget to request the size and timestamp of the file which does circumvent the corruption possibilites of --continue. However not all servers are obligated, or able to return the size of the content being downloaded, which would mean that if unsupported, Wget will download the entire file each time Refrapt is run, as the sizes will not match, even though the timestamps do. This could cause unnecessary redownloads of perfectly good files.

To attain the best of both worlds, Refrapt uses lock files. Just before the call to Wget is made, a Download-lock.* file is created, with the Uri of the resource attempting to be downloaded. Once the Wget download is complete, the lock file is removed. In the event that Refrapt is interrupted, next time Refrapt is run, it will scan the VarPath for any Download-lock.* files. If any are found, this means the file on disk is only partial downloaded. Refrapt will then delete the local copy, to ensure that Wget will always get the file again.

Donate


If you found this software helpful, consider a small contribution. Thanks very much!

paypal