LibraryOfCongress / bagger

The Bagger application packages data files according to the BagIt specification.
Other
120 stars 19 forks source link

Bagger to validate batches of bags on local/network storage, write and hash txt report file #26

Closed houzanme1 closed 8 years ago

houzanme1 commented 8 years ago

As most of users just receive records and bag them, it is hard to validate bag-by-bag and having Bagger validate all the bags in a folder and its subfolders would be very very useful.

Essentially, being able to ask Bagger to validate an entire repository of many bags in various subfolders on a drive or network storage...then create a text report (with its hash) containing successful and failed bag validation, each time it finishes validating the entire repository would be a highly coveted improvement.

Expected outcomes:

If this would be better implemented in a new Bagger Architecture, no need to overhaul the current one if you judge so. Tibaut

johnscancella commented 8 years ago

So to clarify and put into given when then style:

Given a folder that contains many bags that are not nested within each other When bagger is pointed at the directory Then it validates (verify valid according to the spec) each found bag in the directory

Given bags that are and are not valid When bagger has completed validating multiple bags Then The report contains which bags are valid and which are not, along with the date of when bagger started verifying the bags

Do I have that correct? If so might I suggest that in the report file name we have the date. That date would be in the format yyyy-MMM-dd with dd being the 2 digit day of the month, MMM being the three letter abbreviation of the month, and yyyy being the the 4 digit year. So for example something like

Given a report of validating multiple bags by bagger When bagger is writing the report to the filesystem Then the name of the report is in the format multiple-bag-verify-yyyy-MMM-dd.txt

houzanme1 commented 8 years ago

That is a correct representation, and I clarified and included one (variant) condition below.

Tibaut

On Fri, Mar 11, 2016 at 3:59 PM, John Scancella notifications@github.com wrote:

So to clarify and put into given when then style http://martinfowler.com/bliki/GivenWhenThen.html:

Given a folder that contains many bags that are not nested within each other When bagger is pointed at the directory Then it validates (verify valid according to the spec) each found bag in the directory

(variant) Given a main folder that contains many bags organized within many subfolders (with any depth) that are nested within each other When bagger is pointed at the main directory Then it validates (verify valid according to the spec) each found bag in the main directory and in every subdirectory (with any depth)

Given bags that are and are not valid When bagger has completed validating multiple bags Then The report contains which bags are valid and which are not, along with the date of when bagger started verifying the bags

Do I have that correct? If so might I suggest that in the report file name we have the date. That date would be in the format yyyy-MMM-dd with dd being the 2 digit day of the month, MMM being the three letter abbreviation of the month, and yyyy being the the 4 digit year. So for example something like

Given a report of validating multiple bags by bagger When bagger is writing the report to the filesystem Then the name of the report is in the format multiple-bag-verify-yyyy-MMM-dd.txt

— Reply to this email directly or view it on GitHub https://github.com/LibraryOfCongress/bagger/issues/26#issuecomment-195548672 .

johnscancella commented 8 years ago

thinking about this more this is outside the scope of bagger, and therefore should be in its own command line.