contentauth / c2patool

Command line tool for displaying and adding C2PA manifests
Apache License 2.0
94 stars 30 forks source link

C2PA scheme for directories of content #135

Open gar1t opened 10 months ago

gar1t commented 10 months ago

I'm looking at C2PA as a scheme for adding provenance to machine learned files. In my case, a training exercise typically generates multiple files, which are written to a directory.

I'd like to apply a C2PA style scheme to the contents of that directory as a single unit.

This content could be presented by a tar file, but this is not how it's stored. Generated files need to be available in the original directory structure.

Do you have any recommendations on how I should approach this?

gpeacock commented 2 months ago

I've been planning to rework c2patool to support wildcards for some time. In your case would you want it to overwrite the original files with the updated ones? Another approach might be an output directory for files with same subpath/name.

gar1t commented 2 months ago

Directories are potentially very expensive to work with. I'd be nervous about copying a directory ever.

It would be handy to just create a sidecar for a directory. This would solve the problem I have today.

Today, as sidecars only apply to one file, I need to either create a zip file (very expensive in some cases) or create a separate home-spun manifest with digests I can use to verify files, but out-of-band from C2PA. I'd much rather c2patool check the claims and verify file integrity without having to generate a container file (zip, etc.)

If I had to chose between a sidecar or in-directory-mods, I'd easily opt for the sidecar.

The benefit of embedding the manifest in the directory is that it ostensibly makes the directory portable (no sidecars to copy). But this is a stretch - there's nothing portable about a directory of files to begin with. Someone might make the case for this, but I'd wait to see if sidecars aren't totally sufficient.

Sidecar only support for directories removes the problem of whether to edit in place or to copy.

Is there a technical reason sidecars must use the same name as their target file? Supporting explicit sidecar names would allow for multiple manifests for a given file (or directory). (This assumes the sidecars contain an reference to their target files, as opposed to relying on the file naming convention).

Making sidecar names independent of the target files and supporting multiple external files (i.e. the directory case) I think would be an end-game for generalized content.