Closed mikedarcy closed 6 years ago
Forgive my drive-by comment, but I was just reading this spec today and happened on this PR. According to the spec, empty directories are not allowed:
Payload manifests only include the pathnames of files. Because of this, a payload manifest cannot reference empty directories. To account for an empty directory, a bag creator may wish to include at least one file in that directory; it suffices, for example, to include a zero-length file named ".keep".
I agree that the exception should be caught, though.
I also agree that your use case is a good one (creating an empty bag and then adding to it), so alternatively the spec should be updated to support the case. It did seem strange to me on reading that they specifically disallowed empty directories.
It seems like this should be technically valid as long as the manifest is completely empty, which would only allow top-level directories.
Longer-term, this seems like a good topic for bagit 2.0. We've been trying to hit 1.0 and one of the goals there was not breaking compatibility with older text format used in the early history of the spec. For 2.0 I'd like to propose dropping support with the md5deep, et al. tools so we could consider things like tracking directories or even filesystem permissions.
Any update on this @acdha? We have a downstream dependency on this issue and would like to push a release of our software that coincides with a bagit-python
release containing this fix.
v1.6.3 is on PyPI now
Many thanks!
As of the 1.6.x release series, it is no longer possible to create a bag in-place from an empty directory or directory path that does not contain any payload files. This worked in 1.5.4 and previous versions and feels like a regression.
For example:
Yields:
The following error checking logic in
make_manifests
causes this:In our application, we make use of the ability to create a bag in-place without payload files in order to create entirely "holey" bags where all payload files are referenced via fetch.txt. We do this by first creating a empty bag with no payload as a placeholder, and then later create/update the bag manifests with the respective file checkums from our fetch.txt entries.
We think that just as an empty directory is a valid directory (although arguably not very useful), so should a bag without any payload be. As long as the manifests don't contain any entries that reference the empty directory(ies), then the bag is still spec-compliant.
This PR adds code to test if the payload is empty and bypass the set-based error check in such a case. It restores functionality from previous versions of
bagit-python
which, in our opinion, is useful and relatively harmless.If this behavior is to be intentionally removed from 1.6.x and onward, then it might be worth clarifying the exception handling a bit so that a user can better understand why this exception is being thrown in the specific "empty bag" case.