ahankinson / pybagit

Python library for manipulating bagit files.
http://ahankinson.github.io/pybagit
Other
20 stars 8 forks source link

Allow creation of a bag in an already existing directory #7

Open RKrahl opened 8 years ago

RKrahl commented 8 years ago

I'm not quite sure whether the current behavior is a bug or a feature. Let's assume the latter and take this as a feature request.

I start from the following situation: I already have the base directory in place and it contains the data directory with the payload. Otherwise, the base directory contains no further files or directory. E.g. I have:

bag1
\--- data/
      |--- test1.dat
      \--- test2.dat

I want to turn this into a bag, e.g. create the missing bagit.txt, bag-info.txt, manifest-sha1.txt and so on with PyBagIt.

This seem to be not possible with the current version, as it assumes to find a valid bag if the base directory exists:

>>> from pybagit.bagit import BagIt
>>> bag = BagIt("bag1")
>>> bag.update()
Traceback (most recent call last):
  ...
IOError: [Errno 2] No such file or directory: '...bagIt/bag1/bagit.txt'
RKrahl commented 8 years ago

Ping!

I posted this issue to start a discussion on this possible feature. So I really would like to get some comments.

What do you think? Do you think this feature is of any use? Do you want to have it in pybagit? Or would be willing to accept it?

ahankinson commented 8 years ago

Sorry -- I had a look at the code and tried to figure out how best to implement it, but didn't have a chance to get back to you.

Certainly this feature would be very useful, and I would be happy to have a contribution from you.

RKrahl commented 8 years ago

I think, what need to be done for this particular use case can be sketched as follows: we would need a modified version of BagIt._create_bag() that does not start from assumption that the bag directory does not exist, but rather allows some parts of it to be already in place and complete it by adding the missing bits. E.g. in every step in the code, replace the "create file or directory" by "if file or directory exists, use what is already there, otherwise create it". The constructor should always call this modified _create_bag() method, even if the bag directory already exists.

For other use cases, this behavior could a problem, because we might inadvertly "repair" a broken bag where we rather should report an error. This could be solved by adding a "create" flag to the constructor arguments. This should be optional and default to False. The logic in the constructor should be: "if bag dir exists and not create, then call _open_bag(), else call _create_bag()". This way, the current default behavior of BagIt would not be changed. Creation of a bag in an already existing directory would only be done if explicitly requested by the caller.