LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
216 stars 85 forks source link

Handle tags in an order-preserving manner #128

Open UmbrellaDish opened 5 years ago

UmbrellaDish commented 5 years ago

As your code reads and we experienced in tests, entries for the tag file are parsed into a dictionary. As the IETF recommendation reads, however ...

The "bag-info.txt" file is a tag file that contains metadata elements describing the bag and the payload. The metadata elements contained in the "bag-info.txt" file are intended primarily for human use. All metadata elements are OPTIONAL and MAY be repeated. Because "bag- info.txt" is intended for human reading and editing, ordering MAY be significant and the ordering of metadata elements MUST be preserved.

Source: BagIt File Format Specification, v1.0

..., contents of that file, by the way being for human-use only should be saved in an order-preserving tuple, so for instance as to enable users to save multiple organizations where each requires a bunch of metadata items which in consequence will have to be kept together in a bunch. Otherwise, with strict alphabetical ordering as it is now, you cannot know with certainty which Source-Organization, Organization-Address, Contact-Email etc. belongs to each other.

acdha commented 5 years ago

This is only true when running on Python 2. We could have a patch which switches to collections.OrderedDict on Python 2.7 but there's an argument which says we should just say anyone who needs this should upgrade to Python since 2.x has an end of life coming soon anyway.

UmbrellaDish commented 5 years ago

Hi Chris, I don't think Python 3 vs. 2 is the cause. We run bagit as imported module in our Python 3 environment.

Plus, OrderedDict of lists does still throw entries of same keys together, which might be a problem when you want to mention more than one organization with their co-entries in a sequence.

So I would like to propose to consider saving a tuple of key-value pairs in Bag.info. You can (and should in my opinion) retain support for tags passed by dictionary, which would just need to be processed like sorted([ i for i in tags.items() ], key=lambda i: i[0]).