Open bertsky opened 5 months ago
Here is the request we talked about during our meeting today. Please take a look at the following block of code:
workspace = Workspace(resolver, directory=workspace_dir, mets_basename=mets_basename)
WorkspaceBagger(resolver).bag(
workspace,
ocrd_identifier=ocrd_identifier,
dest=bag_dest,
ocrd_mets=mets_basename,
processes=1
)
It would be great if the WorkspaceBagger.bag()
method also took an extra flag skip_download
to avoid downloading file groups not existing on the local storage. There are, of course, white- and blacklist options with include_fileGrp
and exclude_fileGrp
to achieve that by simply ignoring some file groups, but that requires some extra steps plus knowledge of what file groups are locally available and which are not. I am mainly interested in doing that programmatically. How the bagger CLI should handle skip_download
does not matter much, so no extra requirements there.
It would be nice if
ocrd zip bag
supported creating partial clones with some FLocats as mere URL instead of local paths in the payload.Possible use cases:
On the CLI, it would just be another option, but I am not sure it's even allowed in the Bagit data format.