Joystream / storage-node-joystream

Colossus - Storage node for Joystream
GNU General Public License v3.0
4 stars 7 forks source link

Sign/verify uploads on storage node #5

Open jfinkhaeuser opened 5 years ago

jfinkhaeuser commented 5 years ago

The current thinking is that verification can be arbitrarily complex, and include scanning for the right kind of data, etc. Verifying signatures is really more of a low hanging fruit/first stab at this.

jfinkhaeuser commented 5 years ago

This issue is left dangling here with very little further explanation. We've been discussing it in the context of https://github.com/Joystream/joystream/pull/45, however, so it should be specced out better.

jfinkhaeuser commented 5 years ago

This following should become part of a future testnet spec, probably Rome.


Motivation

One concern raised during the specification of the Acropolis testnet is that there currently is no mechanism for preventing misbehaving Liaison storage nodes from storing arbitrary content under any given ContentId. There are several ways for marking content as valid:

  1. Have uploaders include an explicit reference to the storage backend's identifier, on the assumption that the storage backend uses some form of verification. That's not an ideal option, as it leaks storage layer abstractions all the way up to the application layer.
  2. Have ContentId be a hash over the content. This validates the content, but not the uploader.
  3. Have ContentId be a signature over the content. This validates the content and the uploader.

For the quick reasons outlined above, the proposal is to go with the last option.

Signing Scheme

Substrate uses ed25519 or sr25519 keys, so signing with them seems like the simplest option. The uploader of a DataObject already registers themselves as the owner of this DataObject, meaning all relevant information for verifying a signature with this ID is already available to consumers of DataObject.

The only considerations for not using this ownership ID are:

  1. They inextricably tie an account to content in a way that makes changing accounts or sharing ownership impossible. For this reason, many systems adopt ephemeral content keys instead.
  2. They make it impossible to use any key type other than those directly supported by Substrate.

For these reasons, the suggestion is to introduce a signing_pubkey field to DataObject that holds a self-describing public key where the corresponding secret/private key has been used to sign the content.

The downside of this approach is that verifying a creator is now possible, but not that the creator is actually the same person as the owner of the DataObject. A future extension to the runtime might include a full-blown Public Key Infrastructure for this purpose, but that's beyond the scope of the storage module.

On the other hand, the clear upside is that we buy ourselves future use-cases where entirely ephemeral keys are used per DataObject, or organizations comprised of many accounts use the same key pair for creating assets.

Upload Changes

  1. The DataObject creation must now include the signing_publickey.
  2. During upload to the Liaison, a signature using this key is also transmitted, e.g. as a HTTP header field.
  3. During synchronization from Liaison to other storage providers, the signature is also transmitted.

Download Changes

  1. The storage provider transmits the signature in a HTTP header field along with the content.
  2. Clients may verify the signature.

There is some issue here with regards to streaming content: a consuming app may want to start rendering video or audio before transmitting all content data, making verifying the signature at the consumption stage less feasible. The proposed scheme assumes that playback and verification of content are two distinct use cases, and shouldn't be necessary concurrently.

Further Considerations

  1. There is a case for linking the content a bit more to the signature by uploading a concatenation of the content and signature, and foregoing header fields. It makes storage and retrieval of both a lot simpler. The downside is that consumption would require the additional step of splitting the signature from the content. This could happen on the storage provider side during download, meaning only the upload process above would change.
  2. A up- and download scheme that explicitly chunks content, and signs each chunk separately would permit verification of each chunk prior to consumption. It's questionable, though, that the extra overhead introduced is worth it. Effectively, each ContentId would have to map to a sequence of ChunkId on the runtime, each to be signed and stored independently. On the consuming side, re-assembling the stream after validating each chunk is not without pain either.