LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
220 stars 83 forks source link

The selection of tag files is broken #75

Closed RKrahl closed 8 years ago

RKrahl commented 8 years ago

The function _find_tag_files() that selects files to be added to tagmanifest files as added by PR #69 is broken. The intention of this function was to select all files in the bag directory excluding only the payload directory and the tagmanifest files. What the logic in this function actually does, is to select all files excluding files in any directory whose name ends with "data". This is broken in two different ways:

  1. if the bag directory itself ends with "data", all files in this bag directory are excluded, although bag-info.txt, bagit.txt, and manifest-*.txt should in particular be added.
  2. if the payload directory contains any subdirectories not ending with "data", files in this subdirectories are selected for inclusion to the tagmanifest files, although these files, being part of the payload, should not be added.

This bug has been discovered by Kieran O'Leary in the discussion of PR #67.

johnscancella commented 8 years ago

It is only half broken since 2 is incorrect. https://tools.ietf.org/html/draft-kunze-bagit-14#section-2 clearly shows that other directories are allowed in the bag root directory and are to be treated as tags.

RKrahl commented 8 years ago

Sure, but only if they are not subdirectories of the payload directory. In this case, they are not to be considered as tag files, but rather being part of the payload.