LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
220 stars 83 forks source link

Ensure fetch.txt gets processed as a tag file and not a payload file … #49

Closed mikedarcy closed 8 years ago

mikedarcy commented 9 years ago

…if found when creating a new bag.

If I provide a fetch.txt tag file in my bag directory, it is treated as a payload file and placed in the data directory rather than being treated as an optional tag file as the spec outlines. In the current code, how am I supposed to provide a fetch.txt file during bag creation so that this does not happen?

This pull request contains my simple solution for getting around this. Is this sufficient or are there other considerations. Please advise.

edsu commented 9 years ago

Can't you add the fetch.txt file after you have created the bag? What about other tag files?

mikedarcy commented 9 years ago

Sure, I can provide it after bag creation. Also, I agree that such a step is necessary for other arbitrary tag files. However, if the only other tag file one is adding is a fetch file, it makes a pretty cool feature (calling _bagit.makebag() once at creation time against a already populated dir and getting a complete and valid bag as a result) less useful.

I proposed this pull mainly because "fetch.txt" is a well-known special case as defined by the spec and in the case where one creates a bag around a fully populated directory that already includes a fetch file, it simply saves the app from moving that file up out of the data directory (where it gets copied by default) and then calling bag.save(manifests=true) later on.

Also, given a bag that already processed checksums on payload files at creation, If one is only adding a fetch file and not making any other modifications to the bag, it would save processing time since all of the payload checksums would not need to be recalculated during save(manifests=true).

Anyway, I have refactored my code so that I first create an empty bag in an empty dir, then add "fetch.txt", then add payload to the data directory, and then call save(manifests=true). It is a reasonable workaround for me at this point.

Ultimately, since this is more of a request for a convenience feature for a specific use case, I won't be heart-broken if you reject the pull.

Cheers, Mike

johnscancella commented 8 years ago

I can understand the appeal, but the use case of a half made bag seems wrong to me. I don't think this is the direction that we want to go, however I could see the use of a command line flag to point to a fetch file that you want included.