Document process to notarize Workflow MetaData

von commented 7 years ago

Series of calls - shared workflow tarball.
Omkar tried VM on AWS, tried
SM: Still trying to figure out what this means and what it would take. What the Metadata is is not obvious since there are lots of systems involved.
KV: Currently Pegasus just trusted the user when, e.g., multiple copies of an input file are provided that the files are the same.

von commented 7 years ago

@rynge Summary: There will be a workflow-specific key and we'll sign the kickstart records on the remote site, and at the end we'll sign the workflow database. Still a bit vague. What we really want is help from the compute nodes - e.g. a trusted site key. Von: This may be a good idea for future infrastructure that we mentioned in the proposal

von commented 7 years ago

This is in Steve & Omkar's court. Not in the critical path. Marking this as year 3 task.

steveamyers commented 7 years ago

So, I had some thoughts on this. First, when we're forward looking, for machines that support Intel's new SGX secure compute platform, we can write some code that will essentially attest to the fact that they are running on the given machine. We could also support having the code give the ability to cross attest to the actual workflow, if it is written this way (but due to compute resources, I suspect it will not). Interestingly here, for large platforms that are not Intel, it might be possible to have a middleware Intel box that does support it, and simply attests to this fact.

It would be really good if we could get some statistics on which platforms represent large segments of most workflows.

Lastly, I'm wondering if we had a network effect from having individual machines sign jobs, and while we can't have trust in any specific individual, since the different machines are mostly independently administered (assumption), if we have multiple sources in say a a diamond flow, validating that a given node was given a job, then even if it denies it, or suggests another job, we might have some root of trust based simply on numbers of votes/signatures claiming that a given job was or was not run. I'll clarify this shortly. However, this still requires some form of sealing of keys on the local hosts, and/or transporting them back and forth to the correct machines, and only the correct machines, for this to have meaning. Not clear if we can accomplish even this.

IU-CACR / SWIP

Document process to notarize Workflow MetaData #5