Closed stain closed 9 months ago
I'm ok with this, and it makes sense. @tdilauro @jscancella do you have any concerns if all this was added?
Other question -- do we want to create a new version with this (1.5.0)? Or lump it in with all the updates with just did for #31, and call it all 1.4.0?
Other question -- do we want to create a new version with this (1.5.0)? Or lump it in with all the updates with just did for https://github.com/bagit-profiles/bagit-profiles-specification/issues/31, and call it all 1.4.0?
I think it's fine to include in a 1.4.0 (or 1.4.1) release. Since we do have releases, I think it's fair to not consider a spec version to be published until it has been released.
@stain Regarding the presence of a .keep
file to indicate an empty directory, I think that is an implementation detail to accommodate current archive formats that don't track empty directories. I don't think we should necessarily include it in the specification.
@stain Would you share your rationale for the distinction between the directory suffixes /
and /*
?
I'm ok with this, and it makes sense. @tdilauro @jscancella do you have any concerns if all this was added?
I think it can work, but really at what point are duplicating the payload manifest file? I don't know, to me this just seems a but too far(but that;s just my $0.02).
@jscancella This is for the bag profile, so these would effective be saying what must appear in the payload manifest files (directories excluded, of course). The payload manifest is used to help determine if a bag is a valid bag, whereas this would be used to help determine if a valid bag conforms to a specific profile.
@tdilauro thanks. To be clear, I understand the reasoning, I just think it is going to far. But I am just 1 person on this team, and if everyone else agrees that's ok!
Another question for everyone here: How do these constraints interact with "holey" bags / fetch.txt
? My assumption is that a "holey" bag should be allowed to work the way that it is designed to work, but the current language would require these files/directories to be present in bag. Requiring/allowing the files to be in the payload manifest(s), rather than present in the bag could solve this problem for files, but would not work for directories. 🧐
@ruebot @jscancella I'm not currently working in the digital preservation/archiving space these days, but wondering if there are opportunities for broader community engagement on changes to these specs.
Yeah, I haven't worked in this space either for some time (currently I am at USPTO). I think adding new team members or otherwise engaging the broader community would be a great idea.
I'm still in the space, but don't really work with bags anymore. So, it might be worth asking for more feedback on the digital curation list.
Another question for everyone here: How do these constraints interact with "holey" bags / fetch.txt?
Upstream BagIt are not that fan of holey bags anymore, but I am (and it's used in bdbag together with ARK identifiers). So I would consider the profile and Payload-Files-Required
to apply after completing the bag from fetch.txt
, which would anyway be needed to successfully validate the manifest files.
@stain Would you share your rationale for the distinction between the directory suffixes
/
and/*
?
So I think just for consistency with the rest of the profile on Tag-Files-Allowed
allowing globs, while Tag-Files-Required
only have absolute files, and then to avoid Payload-Files-Allowed
permitting both foo/
and foo/*
-- likewise Payload-Files-Required
insists on trailing /
so you can't have an ambiguous file-folder requirement foo
.
Although BagIt can't record empty directories [without .keep
] (https://datatracker.ietf.org/doc/html/rfc8493#section-2.1.3)(as there's no checksum to add to the manifest) I would not want to mandate .keep
to exist as the directory may be non-empty.
(This means the profile can't mandate an empty directory, which I think is OK)
Thus the reasoning is that if a directory is permitted, it must contain something, and hence the *
@stain Regarding the presence of a
.keep
file to indicate an empty directory, I think that is an implementation detail to accommodate current archive formats that don't track empty directories. I don't think we should necessarily include it in the specification.
Not just implementation detail, but the suggested filename by BagIt specs. Can nevertheless change this to just say "placeholder file" and reference back to RFC 8493 section 2.1.3:
A manifest MUST NOT reference directories. Bag creators who wish to create an otherwise empty directory have typically done so by creating an empty placeholder file with a name such as ".keep".
Hi, I am trying to update our Research Object BagIt Profile to match our RO-Crate requirements for BagIt
One of the big changes is that we moved our metadata file from
metadata/metadata.json
to be insidedata/ro-crate-metadata.json
(to avoid rewriting paths to"../data"
if archiving an existing RO-Crate).However, there in the BagIt Profiles there is no payload equivalent to
Tag-Files-Required
so I can't express any requirements for thedata/
folder. It would technically be possible to do"Tag-Files-Required": ["data/ro-crate-metadata.json"]
but even if these tag files need not be listed in tag manifiest files, being indata/
this is a payload file, not a tag file.I would suggest adding
Payload-Files-Required
-- and I guess for consistencyPayload-Files-Allowed
:BTW, I think we need to add the "Conformants bags MUST NOT contain" phrase also to entry 12 on "Tag-Files-Allowed", as technically the use of "MAY" means the files are optional - it doesn't specify other files are not permitted!