bagit-profiles / bagit-profiles-specification

https://bagit-profiles.github.io/bagit-profiles-specification/
Other
35 stars 11 forks source link

Proposal: Tag-Files-Allowed #14

Closed kba closed 5 years ago

kba commented 5 years ago

Tag-Files-Allowed: List of non-payload tag files allowed to be present

Our use case: Allow but not require, bundling README.md and Makefile on the level of the bag to explain (README) or reproduce (Makefile) the provenance of the payload.

We would not want to require data providers to bundle these files but we would not want them to provide any other files either.

mjordan commented 5 years ago

@kba how would validation of an "allowed" list work? Wouldn't it always pass validation (e.g. if the file is there, it passes, and if it's not, it also passes since the file is optional)?

kba commented 5 years ago

Validation would only fail if a file was present that was not explicitly allowed to be there.

mjordan commented 5 years ago

Makes sense, but we currently only have a Tag-Files-Required directive. Does your proposal also include adding a Tag-Files-Not-Allowed' directive in addition toTag-Files-Allowed`?

kba commented 5 years ago

I think Tag-Files-Required and Tag-Files-Allowed should be enough:

For every tag file present in a bag:

For every entry in Tag-Files-Required:

I just realized though that there is no way to specify that all tag files are allowed (the current behavior). That would probably break a lot of existing data 😟 Perhaps allow shell wildcards?

# Allow all
Tag-Files-Allowed: ['*']
# Allow markdown files
Tag-Files-Allowed: ['*.md']
# Allow only README.md beyond the standard tag files
Tag-Files-Allowed: ['README.md']
mjordan commented 5 years ago

Thanks for working through that. I like the idea of allowing wildcards. @ruebot what's your take?

ruebot commented 5 years ago

Yeah, wildcards make sense to me. Should those be explicitly mentioned? And, should this addition bump it to 2.0.0?

kba commented 5 years ago

Yeah, wildcards make sense to me. Should those be explicitly mentioned?

I extended the description to include file patterns in addition to absolute paths.

If Tag-Files-Allowed isn't provided (as it won't be for any profiles currently), the default would have to be ['*'].

And, should this addition bump it to 2.0.0?

Makes sense. IIUC there currently isn't a way to express version conformance of the profile itself? Maybe add a field BagIt-Profile-Version to BagIt-Profile-Info that defaults to 1.1.0 and allow newer changes like this one only if is >= 2.0.0 ?

ruebot commented 5 years ago

Maybe add a field BagIt-Profile-Version to BagIt-Profile-Info that defaults to 1.1.0 and allow newer changes like this one only if is >= 2.0.0 ?

Yeah, that make sense to me. @mjordan what do you think?

mjordan commented 5 years ago

That makes sense, but when we say "allow newer changes... if it is >= 2.0.0" do we need a way in the validator to check which features have been added since 1.1.0? For example, if a profile specifies Tag-Files-Allowed and BagIt-Profile-Version is missing or indicates 1.1.0, does validation fail?

kba commented 5 years ago

do we need a way in the validator to check which features have been added since 1.1.0?

Yes, either by adding if clauses like

if self.profile_version > (2,):

or by subclassing Profile (class Profile20(Profile)) and overriding the particular methods with additional checks.

if a profile specifies Tag-Files-Allowed and BagIt-Profile-Version is missing or indicates 1.1.0, does validation fail?

No, it should succeed then, since Tag-Files-Allowed would be just like any other undefined tag.

mjordan commented 5 years ago

Has this PR been obsoleted by #16?

kba commented 5 years ago

Yes #16 contains these changes, thanks for merging.

I just noticed that I used 1.2.0 instead of 2.0.0. But since the changes should be backwards-compatible, minor version bump makes sense anyway.