Open ChristophWWagner opened 4 years ago
With containers you mean archive file formats like tar and zip, not things like Docker containers, right?
How long would these containers live? Are they meant for assembling the results for long-term storage? Or for transferring files over a network? Or is it meant for short-lived interactions between local successive procedures? Then constant packing+deleting+unpacking might create a lot of overhead?!
We could easily implent those functions using the lib/tarball library. So we don't really have new dependencies out of standard libraries.
Open or packing a tarball is possible throug tarbal.open
command
Unfurtunaly adding seems to be implemented through an additional tarball.add
command.
Additionally it only can add file per command.
Extracting could be done all at once with extractall(destnination)
command.
But we could surely implement a own functions pack, unpack and test function as a tiny wrapper. I've already written pack and unpack, but testing for consistency is more difficult.
import tarfile as tf
import os.path
def pack(filename, *files):
with tf.open(filename, "a") as tar:
for file in files:
tar.add(os.path.basename(file))
tar.close()
def unpack(filename, destination):
with tf.open(filename, "r") as tar:
tar.extractall(destination)
tar.close()
It seems testing tarballs isn't that easy. With inbuilt linux tools you can only verify tar while creating them. I didn't find any python libraries able to validate tar archieves. But we may could implement a own function.
My approach would be a function comparing the sizes of the archieved files plus header with the size of the archieve itself. I will try to investigate on it.
I've got two functions for testing tar archives. Does anyone has some good and fault example archives to supply, please?
Great! The purpose of using tar files is to have a means of "packing" a full directory structure into a single blob of data, such that we can handle it just like any other data object/file within the fridge. This makes handling tar balls somewhat unique to the classes Fridge
, StepShell
and Resource
in the following ways:
Resource
must be able to create a tar ball from the resource directory given. The created tar ball will then be treated as the data object of that item and be hashed. Based on that data hash, the Resource
item will be added to the fridge
StepShell
must be able to extract a tar ball from an Item
object that contains a tar ball (this should only be Resource
objects, if I am not forgetting something) to a temporary directory, where the given command is executed. After execution, the designated output objects shall be added to the fridge by creating Item
objects and for further inspection, shall create a tar ball from the contents of the temporary directory after execution and add this to the fridge for this particular run (intended for later inspection of build leftovers or similar)
Fridge
shall be able to verify the integrity of such a tar ball. However, I would guess that this is not necessary now and is a nice-to-have feature for later
Item
shall be able to report the contents of the tarball for easy inspection from the chefkoch CLI. Howeer this, too, is a nice-to-have feature than may be added later in.
From the current point-of-view I'd proceed with creating a TarBall
class that supports these cases and add it to the same module that either the Fridge or the Items reside.
The integrity test should be possible. I wrote a function comparing the filesize with the size of all Items plus their header. So it is possible to detect, if data is not seen by archive failures. But for now I'm not 100% sure if it works with all file types and all tar version due to not enough test archives.
I would be grateful if anyone can provide some. Otherwise I will create some archives of every encoding containing every filetype later.
In chefkoch we would like to support calling shell scripts, too and these often tend to work on multiple files or directories. For a variety of reasons it is desireable to handle these situations with containers:
Tarballs are a de-facto standard in unix. They provide no compression (unless combined with gzip to produce the infamous .tar.gz), but retain user permissions and even ACLs. Also, the no-compression constraint is actually a feature when it comes to performance. Adding the wide support for tar, this seems to be the natural choice for container format. However, feel free to suggest alternatives, if you come across one.
This issue shall:
Show example functions for
pack(tarball, *files)
, where tarball is the name of the resulting archive and files a list of files (optionally: should support globbing)unpack(tarball, destination)
, where tarball is the name of the archive to be unpacked and destination is where the files shall be unpackedtest(tarball)
, where tarball is the name of the archive to be tested. If errors are found, an exception shall be raised.