cooling-singapore / saas-middleware

Simulation-as-a-Service (SaaS) Middleware
MIT License
0 stars 0 forks source link

Invariant Object Ids #91

Closed HeikoAydt closed 3 weeks ago

HeikoAydt commented 3 years ago

Description Object ids are derived from the hashes of the data object content and descriptor. A potential problem arises if the content is encrypted. The exact same content encrypted with different keys (or compared when not encrypting at all) results in different object ids. In itself this is not necessarily a problem. However, certain functionality of the SaaS nodes is affected by this. This includes analysis of data provenance meta information as seemingly different data objects lead to a different provenance graph structure. Also, avoiding redundant data storage is affected because the DOR can no longer identify the same data objects. On the other hand, this may be actually correct behaviour because if the data object is encrypted then the DOR shouldn't be able to say anything about the content anyway.

Ideally, object ids would be based on the non-encrypted content and remain invariable regardless if encryption is used or not. The question is how to verify that the object id is indeed valid? An entirely different scheme for object ids may be necessary.

Outcomes

HeikoAydt commented 3 years ago

truly invariant object ids are probably not possible without some fair amount of homomorphic encryption which is out of reach for now. the second best solution would be to distinguish between object ids that are invariant and those that are not.

rationale: the DOR is responsible for generating the object id and has to be able to guarantee certain properties of the object id. if a DOR claims an object id to be invariant, regardless of content encryption, then it needs the ability to ensure this. for example, one trade-off could be that the DOR is responsible for (re-)encryption of data object contents. that way the DOR can guarantee object id invariance at the expense of the user having to trust the DOR. alternatively, a user may chose to encrypt data objects themselves and thus not having to trust the DOR at the expense of not having object id invariance (that's how it is done currently). that way the system would support both cases and users can opt for whichever suits their needs better.

HeikoAydt commented 3 weeks ago

Obsolete.