LibraryOfCongress / bagger

The Bagger application packages data files according to the BagIt specification.
Other
120 stars 19 forks source link

Deletion of tagmanifest-md5.txt does not invalidate Bag #56

Closed jo1525085251 closed 6 years ago

jo1525085251 commented 6 years ago

Environment

OS: Windows 10 64bit Enterprise Java: Java 1.8.0_171-b11 Version: Bagger 2.8.1.

Steps to reproduce

Expected results?

Actual result?

2018-05-08 10:51:32,877 INFO [Thread-4] g.l.r.b.v.i.CompleteVerifierImpl [CompleteVerifierImpl.java:276] Completed verification that bag is complete. 2018-05-08 10:51:32,878 INFO [Thread-4] g.l.r.b.v.i.CompleteVerifierImpl [CompleteVerifierImpl.java:277] Note that this a verification of completeness, not validity. A bag may be complete without being valid, though a valid bag must be complete. 2018-05-08 10:51:32,880 INFO [Thread-4] g.l.r.b.v.i.CompleteVerifierImpl [CompleteVerifierImpl.java:278] Result of verification that complete: Result is true. 2018-05-08 10:51:32,895 INFO [Thread-4] g.l.r.b.v.i.ValidVerifierImpl [ValidVerifierImpl.java:83] Completed verification that bag is valid. 2018-05-08 10:51:32,896 INFO [Thread-4] g.l.r.b.v.i.ValidVerifierImpl [ValidVerifierImpl.java:84] Validity check: Result is true. 2018-05-08 10:51:35,083 INFO [AWT-EventQueue-0] g.l.r.b.u.BagView [BagView.java:660] Stopped the timer

image

johnscancella commented 6 years ago

Hi @jo1525085251 Thanks for submitting this, it is really appreciated! This is the expected behavior since there is no file that tracks a tag-manifest. So when you delete the tagmanifest-md5.txt, there is nothing that knows the file should exist and thus the program cannot tell you it is supposed to be there. A tagmanifest-md5.txt is an optional file in a bag, you can read more about that from the specification: https://tools.ietf.org/html/draft-kunze-bagit-14#section-2.2.1

jo1525085251 commented 6 years ago

Hi John

Thanks for that. I should have realised that it was the expected behaviour.

A bit of background, I am writing some test cases around our use of BagIT and someone said "What happens if we delete.....?". So I thought I would give it a go. I'll check out the specification and reflect that back to our QA so we have a "position" if we are asked that at audit.

Cheers

Jon


From: John Scancella notifications@github.com Sent: 08 May 2018 11:31 To: LibraryOfCongress/bagger Cc: jo1525085251; Mention Subject: Re: [LibraryOfCongress/bagger] Deletion of tagmanifest-md5.txt does not invalidate Bag (#56)

Hi @jo1525085251https://github.com/jo1525085251 Thanks for submitting this, it is really appreciated! This is the expected behavior since there is no file that tracks a tag-manifest. So when you delete the tagmanifest-md5.txt, there is nothing that knows the file should exist and thus the program cannot tell you it is supposed to be there. A tagmanifest-md5.txt is an optional file in a bag, you can read more about that from the specification: https://tools.ietf.org/html/draft-kunze-bagit-14#section-2.2.1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/LibraryOfCongress/bagger/issues/56#issuecomment-387372811, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AlDj2uyf4l9LpEGyKGhwg3Rqx5i4h_i3ks5twYIDgaJpZM4T2WZg.

johnscancella commented 6 years ago

Hi Jon,

The BagIt specification can be confusing at times. I don't know how much help this is, but you can take a look at https://github.com/LibraryOfCongress/bagit-conformance-suite which is a collection of good and bad bag examples that show a lot of the edge cases we run across.

In this specific example, you can mitigate it a little bit by having multiple tag manifests in different algorithms (so you already have md5, you could add sha256). That way the md5 tag manifest can reference the sha256 manifest and vice versa.

And on a completely different note, md5 really shouldn't be used anymore since you can have checksum collisions that cause problems. If you are able I would go with sha512 or sha256. And if you are comfortable you can write some Java code using https://github.com/LibraryOfCongress/bagit-java/blob/master/src/main/java/gov/loc/repository/bagit/conformance/BagLinter.java to check your bag for issues automatically.

Best regards