LibraryOfCongress / bagit-java

Java library to support the BagIt specification.
Other
73 stars 49 forks source link

Bag completeness for tag files #124

Open rvanheest opened 6 years ago

rvanheest commented 6 years ago

In BagIt v16 spec it says on completeness of a bag:

  1. Every file listed in every tag manifest MUST be present.

Likewise in earlier specs (v14) it says:

  1. Every file in every tag manifest MUST be present. Tag files not listed in a tag manifest MAY be present.

However, the BagVerifier only verifies the completeness in terms of the payload manifests, not in terms of the tag manifests. Shouldn't that be added also? Or is that done somewhere else?

jscancella commented 6 years ago

@rvanheest and @acdha the PayloadVerifier actually does verify all files listed in all manifests. It should probably be renamed/refactored to make this more obvious

It does this by first getting all the files listed in all manifests (see https://github.com/LibraryOfCongress/bagit-java/blob/master/src/main/java/gov/loc/repository/bagit/verify/PayloadVerifier.java#L115-L134) and then verifies them (see https://github.com/LibraryOfCongress/bagit-java/blob/master/src/main/java/gov/loc/repository/bagit/verify/PayloadVerifier.java#L102-L103)

rvanheest commented 6 years ago

Thanks for the quick response (as always!). I now see what my confusion was: in PayloadVerifier I mainly looked at line 105-109 and glanced over 102-103. That one does the checking mentioned in my original post. The 105-109 part checks the opposite: "all files in the payload directory should be listed in all manifests".

https://github.com/LibraryOfCongress/bagit-java/blob/2b7002e62d721f5eb50461e4c4c70a6ef643ec1d/src/main/java/gov/loc/repository/bagit/verify/PayloadVerifier.java#L98-L110

Thanks for the clarification. That helped a lot. Yes, refactoring this a bit probably won't hurt.