RFC: Introduce a 'Policy Repository'

matgnt commented 1 year ago

I want to kindly ask to give feedback regarding the following proposal:

Example snippet from the catalog example:

      "odrl:hasPolicy": [ 
        {
          "@context": {
            "@vocab": "https://www.w3.org/TR/odrl-model/"
          },
          "@id": "urn:uuid:2828282:3dd1add8-4d2d-569e-d634-8394a8836a88",
          "permission": [
            {
              "action": "use",
              "constraint": [
                {
                  "leftOperand": {
                    "@value": "spatial"
                  },
                  "rightOperand": {
                    "@value": "EU"
                  },
                  "operator": "EQ"
                }
              ],
              "duty": []
            }
          ],
          "prohibition": [],
          "obligation": []
        }
      ],

https://raw.githubusercontent.com/International-Data-Spaces-Association/ids-specification/71f06f718147f12a4d333e9a9e604d13944882b1/catalog/message/catalog.json

Since this is very redundant, because many of those policies / offers are very similar, I think it would be also allowed to use IDs as a reference instead of the entire node of the object, meaning:

"odrl:hasPolicy": [
    {
        "@type": "odrl:Offer",
        "@id": "https://provider.com/edc/offer/1",
   },
   {// next possible policy }
],

At least that is my - still limited - understanding of: https://www.w3.org/TR/vocab-dcat-3/#conformance

Additional constraints in a profile MAY include: Controlled vocabularies or IRI sets as acceptable values for properties

and if this is the case, I would even slightly change the id of the policy and make it a "content addressable storage" by using the hash of the policy itself (as a URL) to reference it. So it would look like:

"odrl:hasPolicy": [
    {
        "@type": "odrl:Offer",
        "@id": "https://provider.com/policies/cdfd26aaf5b1fdc6d71af7c1349869f9314b67626bc1eec44e64af674e357eed",
   },
   {// next possible policy }
],

where the cdfd26aaf5b1fdc6d71af7c1349869f9314b67626bc1eec44e64af674e357eed is a sha256 hash of the policy itself. Serialization / canonicalization details need to apply of course.

That means there would be a new endpoint containing all possible policies: a policy repository under /policies/{hash}

The only way that this is possible at all, is because of the separate profile which says: https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/catalog/catalog.protocol.md#5-dcat-and-odrl-profiles

Each ODRL Offer must NOT include an explicit target attribute.

because hashing the policy WITH a target wouldn't work :-)

The advantages would be:

Reduce data transfer, assuming many datasets use the same policy
Reduce processing overhead (on both sides, but mainly) on the consumer side, because it's immediately clear whether the policy is already known and can be accepted (allow-listed policies...)
uniqueness of policies may allow some further optimizations during the flow

Possible DISadvantages:

Are policies protected content already? And is the hash 'unique' enough to 'protect' its content? Further access control could be applied, too, but probably is not worth it. I guess it's better to add the full policy to the dataset instead of a reference if there are concerns. Can be decided per dataset of course.

Any thoughts on this?

ssteinbuss commented 1 year ago

https://www.w3.org/TR/odrl-model/#policy-has

matgnt commented 1 year ago

Summary of the discussion in the weekly meeting:

jsonld allows this
dataspace should define whether such document resolutions must be allowed. This could allow relative URLs and potentially allow-listed dataspace specific repositories. Resolving any 3rd party reference is considered a security issue and should not be done.
the hash identifier was considered a good idea to uniquely identify a policy
the transfer size is not considered an issue, because the response could be gzipped content
we'll add a 'Note' box to the spec to make it clear to the reader that this is a dataspace decision.

Thanks for the discussion during the meeting, Matthias Binzer

matgnt commented 1 year ago

This discussion in ODRL might be also relevant: https://github.com/w3c/odrl/issues/12

sebbader-sap commented 1 year ago

Adding my two (three) cents:

the transfer size is not considered an issue, because the response could be gzipped content

Should be something to consider in my view as the (http) protocol binding should make an either-or decision:

either the content is plain JSON (then the size is a factor), or
the content shall be compressed, or
both is possible (then both clients and servers need to implement functions for both)

For now, option 1 is described, therefore the size of the body can be a factor.

the hash identifier was considered a good idea to uniquely identify a policy

Putting additional information (like type declarations or other things like a content hash) is, if I remember correctly, regarded as a not-so-good pattern in the RDF/Linked Data world. I don't remember all details but most likely it boils down that the referenced JSON document usually (always?) needs to contain the identifier itself:

{
          "@context": {
            "@vocab": "https://www.w3.org/TR/odrl-model/"
          },
          "@id": "https://provider.com/policies/cdfd26aaf5b1fdc6d71af7c1349869f9314b67626bc1eec44e64af674e357eed",
          "permission": [
               ...
          ]
}

In that case, "cdfd26aaf5b1fdc6d71af7c1349869f9314b67626bc1eec44e64af674e357eed" cannot be the hash of the JSON document as it would need to contain itself...

dataspace should define whether such document resolutions must be allowed.

I'd like to see it in the schemas, either "only expect '@id' here" (reference case) or "expect a full odrl:Offer object here" to reduce the degree of freedom for the individual implementations / increase their interoperability. But this makes it more complicated for different data spaces as they would need deviating schema files, manage their versions accordingly, ...

matgnt commented 1 year ago

After some discussion over the last weeks regarding json-ld and how arrays and timestamps are represented (#139 and #125) and the resulting new jsonld context file proposed here: https://github.com/International-Data-Spaces-Association/ids-specification/issues/132#issuecomment-1658376829 I tried to think about the consequences of such changes and cam back to this issue here.

Luckily, @mkollenstart could also spend some time and we had some deeper discussions on the matter and came up with potential options to go forward with:

Option 1: Json-LD anyway allows remote / referenced documents and we do NOT explicitly allow or disallow this in the text. Proposal is, that we explicitly allow this for hasPolicy (entire Policy) or a level deeper, the Rules inside a policy under e.g. "permission": [

Currently we don't see a way to express this in a Json-LD context directly, so I think it should be described in the text.

Option 2: Make the referenced documents for hasPolicy and/or Rules the default. In many cases the information is very repetitive anyway. This would require 2 additional endpoints in the catalog interface /policies/<id> and /rules/<id>

An advantage of Option 2 is, that id could be any, also non-resolvable id, e.g. a uuid and the endpoint would require the same auth mechanisms as the regular catalog interface.

Also, the references can be http:// identifiers. In such cases, the biggest question is how to deal with authentication at such endpoints. Probably the easiest approach was to say such http references should be publicly available endpoints like schema.org and others. A consumer always might decide NOT to fetch from unknown endpoints!

I think a dataspace might also define such a policy and rules repository and also might define separate authentication mechanisms for it, e.g. checking a dataspace Membership Credential. But I think this is out of scope for the DSP spec itself.

Any further thoughts on this? Let's discuss this tomorrow in our weekly meeting.

--

Matthias Binzer

arnoweiss commented 2 months ago

Having this would remove a lot of constraints that the DSP places on the usage of Linked Data. I'm unsure having json-schemas in addition to shacl shapes would even be feasible in that case.

Also, there hasn't been any progress on this in ten months, so I suggest to close this ticket.

matgnt commented 1 month ago

I would consider this as a potential optimization of the DSP and we just didn't work on this to get the initial version 0.8 released. How to deal with such open topics? We should not just close it.

arnoweiss commented 1 month ago

I'm opposed to increased flexibility in the protocol's message payloads. But yea, perhaps there should be a structured WG decision on this. @ssteinbuss - WDYT?

jimmarino commented 1 month ago

Work on DSP is now moved to Eclipse. We should raise new issues in the Eclipse WG.

ssteinbuss commented 1 month ago

That was discussed in our last call on Thursday. We will assess each issue in this repo and decide which to move to the Eclipse Project. I doubt that we should bring each issue of the project to the Working Group level.

International-Data-Spaces-Association / ids-specification

RFC: Introduce a 'Policy Repository' #77