LibraryOfCongress / bagit-conformance-suite

Test cases for validating BagIt implementations
Other
10 stars 8 forks source link

Test cases for bag-info values #8

Open acdha opened 7 years ago

acdha commented 7 years ago

In discussion with @johnscancella and @reeset on https://github.com/reeset/bagit_sharp/issues/1 had some additions to the duplicate metadata testcase:

johnscancella commented 7 years ago

Case-insensitive handling of the keys – e.g. "Contact-Email" and "contact-email". https://tools.ietf.org/html/draft-kunze-bagit-14#page-6 only mandates insensitivity for the reserved names but I think this should be clarified in the spec to mandate case-insensitive access for all values to avoid confusion.

is this getting into too much detail of the implementation?

acdha commented 7 years ago

@johnscancella I'm a bit mixed on that but I've been leaning case-insensitive for everything since the info file is intended to be human-managed and humans tend not to care. It seems like a bug if someone has “Crawl-date” in one bag and “Crawl-Date” in another but a program only sees one of them because it used a case-sensitive library.

What do you think - play it conservatively and add a spec update before finishing this ticket? This might either be partially out of scope or otherwise incompatible with this repo currently since we only collect valid/invalid bags and this would be a discrepancy in how a tool processes the bag rather than a question of validity.

johnscancella commented 7 years ago

I vote to say make a spec update and say that keys are case insensitive but the values are case sensitive since there might be some special meaning.

I would also add that implementations should preserve case of the keys as entered. That way you can do something like

  1. read a existing bag
  2. write it out to a different directory
  3. compare file to file - there should be 0 differences
acdha commented 7 years ago

Definitely, I’m certain the intention was always that only keys were case-insensitive.

I wonder how we should recommend a test process: what we have right now is basic compliance where a bag is valid or invalid. Do you think we should specify that testing has to include a load-save cycle or have that as something like compliance levels where a level 1 implementation could just be something which validates but can’t even do anything else?

johnscancella commented 6 years ago

Yeah, we can specify that in the README and use bagit.py and bagit-java as source implementations to look at for testing.