datafoodconsortium / standard

This is the DFC standard on GitBook.
https://datafoodconsortium.gitbook.io/dfc-standard-documentation/
GNU Affero General Public License v3.0
5 stars 3 forks source link

# DFC Implementation of WebID and profile #10

Open RaggedStaff opened 9 months ago

RaggedStaff commented 9 months ago

Introducing WebID and profile

In the following proposal, any DFC user and platform would have their own WebID and profile(s). When dereferenced, a WebID would give information (profile) which can be used to find data.

The WebID could be used with Solid-OIDC for authentication and authorization instead of the email (currently used in the DFC token). This will be pursued in a future Major Version of the Standard

Platform WebID and profile

For instance, a public WebID of a platform (https://ofn.org/card#me) could look like the following and contain links to pieces of data:

# This is the WebID profile document
<> a foaf:PersonalProfileDocument; # (note: there is a proposal to rename it to solid:profileDocument).
  foaf:primaryTopic <#me>.

<#me> a foaf:Agent; # this is the WebID profile, containing useful information to discover data.
  foaf:name "Open Food Network";
  pim:preferencesFile <link/to/the/preferences/file>; # could be used to restrict access to some information (see below).
  solid:publicTypeIndex <link/to/the/public/type/index>; # we can use TypeIndex or whatever vocabulary to advertise some index documents.
  dfc-b:EnterpriseByNameIndex <link/to/the/index/document>. # or we could list the indexes directly in the WebID using custom predicates which is linked to specific indexes like this one.

If the platform want to restrict access to some information, it could move them in a private or restricted (ACL) preferences file:

<> a pim:ConfigurationFile.

<https://ofn.org/card#me> 
  solid:publicTypeIndex <link/to/the/public/type/index>; # like before we can use TypeIndex or any vocabulary of our choice.
  dfc-b:EnterpriseByNameIndex <link/to/the/index/document>. # or direct predicates.

User's WebID and profile

The following is an example of an user's WebID (https://webid.provider/user/card#me):

<> a foaf:PersonalProfileDocument;
  foaf:primaryTopic <#me>.

<#me> a foaf:Person;
  foaf:name "Bob";
  pim:storage <https://socleo.fr/user/dfc/> <https://ofn.org/user/dfc/>; # link to the user's data on different platforms.
  pim:preferencesFile <link/to/the/preferences/file>. # Used to restrict access to sensitive information (see below).

To restrict access to sensitive information, the user did move some information in the preferences file, protected by ACL:

<> a pim:ConfigurationFile.

# Here we are extending the profile of the user...
<https://webid.provider/user/card#me>
  dfc-b:agent # ...mapping the user to his corresponding dfc-b:Person on the different platforms he uses.
    <https://socleo.fr/user/dfc/agent/person/user> 
    <https://ofn.org/user/dfc/agent/person/user>.

# We also define a configuration to find the data.
<#config> a dfc-b:Configuration;
  solid:privateTypeIndex <link/to/the/private/type/index>; # TypeIndex could be used to lists containers.
  dfc-b:enterpriseContainer <link/to/the/container>; # or use DFC custom direct predicates like this one.
  void:somePredicate <...>; # or we can use void datasets or any other vocabulary of our choice.
  dfc-b:EnterpriseByNameIndex 
    <https://socleo.fr/user/dfc/enterpriseByNameIndex> 
    <https://ofn.org/user/dfc/enterpriseByNameIndex>.

Originally posted by @lecoqlibre in https://github.com/datafoodconsortium/standard/discussions/4#discussioncomment-7635937

Tasks

RaggedStaff commented 9 months ago

I'm keen to get @balessan & the Startin'Blox crew's take on this: will this help you to align more closely with the DFC Standard, or move us further apart ? I'm hoping the former, as you already support Solid-OIDC & that includes WebID, right :question:

I'm also a little wary as I don't think solid:publicTypeIndex is a well supported feature currently. For example ActivityPods doesn't have a clear roadmap to implement TypeIndexes, and the issue hasn't been updated since Oct '23. :worried:

@lecoqlibre - Could you propose an alternative to using solid:publicTypeIndex that is more widely supported ? I see you're proposing we have LDP containers defined with the DFC Standard as an alternative to solid:publicTypeIndex

RaggedStaff commented 8 months ago

@lecoqlibre Could you clarify where we got to on using this alongside OIDC, as part of the current implementation ?

Would we only need platforms to implement WebId :question:

lecoqlibre commented 6 months ago

The following is a non-normative proposal as a first step.

Ping @RaggedStaff @simonLouvet.

DFC WebId proposal v1

The main idea of this proposal is to find instances of DFC Workspace from a WebId. The workspace is a new concept we would add in our ontology. It describes a DFC environment and leads to the user's data. The DFC Workspace is hence the main concept to discover data from for both Solid and non-Solid apps. For non-Solid apps like most of the current DFC platforms the workspace will lead to API endpoints instead of resource documents. In that case the authentication token is used by the platform to provide only the data of the authenticated user.

The concept of DFC Workspace groups consistent DFC data together in a LDP container. It contains a single entry-point resource, let's say index which describes the workspace concept and lead to user's data thanks to a TypeIndex. Users might have several DFC workspaces especially when using Solid storages but non-Solid platforms can provide just one.

TypeIndex are part of this proposal because they already offer what we need: a way to discover resources of a certain type. Other mechanisms could be designed but TypeIndex are very simple and powerful. They are also widely used by the Solid community for a long time.

WebId of user and WebId of platform would be quite similar. This has the advantage of having the same code (especially in the connector) to discover data regardless if the WebId is about a user or a platform.

Note: non-Solid DFC platforms just have to implement a WebId for the platform. There is no need to implement a WebId for users at this time.

So a user or platform data discovery would work the same starting from dereferencing the WebId:

  1. The WebId profile leads to the main TypeIndex (public or private);
  2. The main TypeIndex lists the available DFC Workspace(s);
  3. The Workspace describes the DFC environment and leads to the TypeIndex of the workspace;
  4. The TypeIndex of the workspace leads to the user's data.

Note: all these resources could be returned into a single document.

Example for a non-Solid DFC platform

The WebId profile document of the platform

Contains the WebId profile (https://platform.ex/profile/card#it) which is linked to a main TypeIndex (public in the example).

{
  "@graph": [
    {
      "@id": "https://platform.ex/profile/card",
      "@type": "foaf:PersonalProfileDocument",
      "foaf:primaryTopic": "https://platform.ex/profile/card#it"
    },
    {
      "@id": "https://platform.ex/profile/card#it",
      "@type": [ "foaf:Agent", "dfc:Platform" ],
      "foaf:name": "Name of the platform", 
      "solid:publicTypeIndex": "https://platform.ex/publicTypeIndex"
    }
  ]
}

Note: there were discussions about adding a foaf:ProfileDocument to use instead of foaf:PersonalProfileDocument where it make sense like in this case. It seems this predicate still does not exist. We might keep the existing predicate or use something else.

The main TypeIndex of the platform

Leads to the DFC API entry point (Workspace LDP container) of the platform.

{
  "@graph": [
    {
      "@id": "https://platform.ex/publicTypeIndex",
      "@type": "solid:TypeIndex"
    },
    {
      "@id": "https://platform.ex/publicTypeIndex#1",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc:Workspace",
      "solid:instance": "https://platform.ex/dfc/"
    }
  ]
}

Note: we could have a direct link to the index file instead of a link to the Workspace container so the name of the resource could be anything. While this can be done, it is preferable to normalize the Workspace entry point. This will ensure the Workspace is self-sufficient as it won't depend on external semantics (so potential breaking changes could be limited).

The Workspace of the platform

The dfc:Workspace object is defined in the index file at the root of the Workspace LDP container. It leads to a TypeIndex (dfc:hasTypeIndex) which will list the data endpoints of the platform.

{
  "@graph": [
    {
      "@id": "https://platform.ex/dfc/index",
      "@type": "dfc:Workspace", 
      "dfc:hasTypeIndex": "https://platform.ex/dfc/typeIndex"
    }
  ]
}

We could imagine to create other predicates for the Workspace like a name for instance.

The TypeIndex of the Workspace

Lists the DFC data endpoints of the platform. Here we discover that this platform provides a LDP container for dfc-b:Enterprise, another one for dfc-b:Person and also one for dfc-b:Catalog. So if we want to create a new dfc-b:Catalog for instance we know we should make a HTTP POST request to https://platform.ex/dfc/catalog/.

{
  "@graph": [
    {
      "@id": "https://platform.ex/dfc/typeIndex",
      "@type": "solid:TypeIndex"
    },
    {
      "@id": "https://platform.ex/dfc/index#1",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc-b:Enterprise",
      "solid:instanceContainer": "https://platform.ex/dfc/enterprise/"
    },
    {
      "@id": "https://platform.ex/dfc/index#2",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc-b:Person",
      "solid:instanceContainer": "https://platform.ex/dfc/person/"
    },
    {
      "@id": "https://platform.ex/dfc/index#3",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc-b:Catalog",
      "solid:instanceContainer": "https://platform.ex/dfc/catalog/"
    }
  ]
}

Regarding DFC Solid applications

Solid DFC applications will use user WebIds, provided by the Solid-OIDC authentication method. These WebIds should follow the Solid WebId profile specification.

DFC workspaces (dfc:Workspace) should be discovered the same way as they are discovered for a DFC non-Solid platform.

Note about trusted platforms

In this proposal version the way that trusted platforms are stored is left to the platform. So the WebId of trusted/associated platforms could be stored by any mean by platforms.

Do we want platforms to expose their trusted platforms? If so we should add a predicate somewhere in the discoverability chain. The acl:trustedApp might be used.

Note about resource containment

The platform Workspace LDP root container should contain every other Workspace containers and resources as child (ldp:contains).

This is to ease future Solid compliance. This file tree also offers some advantages when exchanging a Workspace as the Workspace folder will contain every resources it needs (self-sufficient).

Note: this is not constraining how the sub hierarchy is structured. Any sub level with any path could be used as long as the paths are contained in the Workspace root container.

Here is an example of a Workspace root LDP container:

{
  "@graph": [
    {
      "@id": "https://platform.ex/dfc/",
      "@type": "ldp:BasicContainer", 
      "ldp:contains": [ "https://platform.ex/dfc/index", "https://platform.ex/dfc/catalog/", "https://platform.ex/dfc/some/child/path/enterprise/" ]
    }
  ]
}
RaggedStaff commented 6 months ago

Thanks for putting this together @lecoqlibre ! I think this all seems to make general sense.

I'd like to comment on some details. Would it be worth moving this content to a .md file & start a PR ? Then we can comment on individual points more easily.

Also pinging @balessan @mkllnk for their thoughts... Will this make it easier for SiB to integrate with DFC ? Are there potential issues for OFN implementing this ?

mkllnk commented 6 months ago

Thank you. The points of compatibility and discoverability make sense. I don't understand the examples well enough though to completely understand. Could we use more realistic data in the examples?

From this, I guess that each platform will have one endpoint to create orders, for example. Unless we create a workspace for each enterprise to have its own order creation endpoint?

Is TypeRegistration an container of existing data, for example products? Or is it to create products?

lecoqlibre commented 6 months ago

@RaggedStaff I agree making a .md in a PR would be better. But I didn't know where to put that file. I think we should work on rewriting the standard and maybe move to ReSpec someday? Maybe for now we can put this into the "Technical specifications" section maybe in a sub section called "Data discovery"?

@mkllnk The current proposal assumes that non-Solid DFC platforms provide a single DFC workspace. It's an API endpoint which will serve for all users based on their email found into the OIDC token. But a non-Solid DFC platform can provide one different workspace URL per user. For instance, if the user "user@ex.org" is authenticated, the platform will have to retrieve his workspace from any mean it used to store it before and provide the link, let's say https:platform.ex/user/workspace/, in the main TypeIndex like:

{
  "@graph": [
    {
      "@id": "https://platform.ex/privateTypeIndex",
      "@type": "solid:TypeIndex"
    },
    {
      "@id": "https://platform.ex/privateTypeIndex#1",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc:Workspace",
      "solid:instance": "https:platform.ex/user/workspace/"
    }
  ]
}

When using this kind of setup it is maybe better to use a private TypeIndex which should be discovered through the preferences file (see the TypeIndex spec). In fact we should maybe use only private TypeIndex as DFC data is supposed to be private.

The TypeRegistration is an entry of a (TypeIndex) index. In case of the TypeIndex of the Workspace, registrations are leading to LDP containers where a particular type of data can be found (ex: dfc-b:Catalog or dfc-b:Enterprise, etc). These containers are used to GET data but also to POST, PUT or PATCH. A platform could also provide resources (documents) instead of containers using the solid:instance predicate.

I don't see how to provide more realistic data as only the "https://platform.ex" and "Name of the platform" should be respectively replaced by the real platform URL and its name.

mkllnk commented 6 months ago

Thank you, @lecoqlibre. That answers a lot. So we could have https://openfoodnetwork.org.au/api/dfc/index to display the private type index. That URL is referenced in a webid file? Where does that come from?

lecoqlibre commented 6 months ago

@mkllnk WebIds of platforms are just a simple document you put at some URL on the platform. When dereferenced the WebId gives information about the platform. A platform will have to store the WebIds of the platforms it interacts with. WebIds can be seen as an entry point to discover data from.

To use a main private TypeIndex we should pass through the pim:preferencesFile to find the preferences file in which we will find a link to the private TypeIndex.

Below is a full example for OFN. Here is the WebId using a preferences file:

{
  "@graph": [
    {
      "@id": "https://openfoodnetwork.org.au/profile/card",
      "@type": "foaf:PersonalProfileDocument",
      "foaf:primaryTopic": "https://openfoodnetwork.org.au/profile/card#it"
    },
    {
      "@id": "https://openfoodnetwork.org.au/profile/card#it",
      "@type": [ "foaf:Agent", "dfc:Platform" ],
      "foaf:name": "Open Food Network", 
      "pim:preferencesFile": "https://openfoodnetwork.org.au/preferences"
    }
  ]
}

The preferences file of the platform look like:

{
  "@graph": [
    {
      "@id": "https://openfoodnetwork.org.au/preferences",
      "@type": "space:ConfigurationFile",
    },
    {
      "@id": "https://openfoodnetwork.org.au/profile/card#it",
      "solid:privateTypeIndex": "https://openfoodnetwork.org.au/privateTypeIndex"
    }
  ]
}

The private TypeIndex of the platform:

{
  "@graph": [
    {
      "@id": "https://openfoodnetwork.org.au/privateTypeIndex",
      "@type": "solid:TypeIndex"
    },
    {
      "@id": "https://openfoodnetwork.org.au/privateTypeIndex#1",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc:Workspace",
      "solid:instance": "https://openfoodnetwork.org.au/api/dfc/"
    }
  ]
}

Then we could access to the workspace by dereferencing the index resource in the workspace container:

{
  "@graph": [
    {
      "@id": "https://openfoodnetwork.org.au/api/dfc/index",
      "@type": "dfc:Workspace", 
      "dfc:hasTypeIndex": "https://openfoodnetwork.org.au/api/dfc/typeIndex"
    }
  ]
}

Following the value we got from the dfc:hasTypeIndex predicate in the index, we can dereference the TypeIndex of the workspace:

{
  "@graph": [
    {
      "@id": "https://openfoodnetwork.org.au/api/dfc/typeIndex",
      "@type": "solid:TypeIndex"
    },
    {
      "@id": "https://openfoodnetwork.org.au/api/dfc/typeIndex#1",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc-b:Enterprise",
      "solid:instanceContainer": "https://openfoodnetwork.org.au/api/dfc/enterprise/"
    },
    {
      "@id": "https://openfoodnetwork.org.au/api/dfc/typeIndex#2",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc-b:Person",
      "solid:instanceContainer": "https://openfoodnetwork.org.au/api/dfc/person/"
    },
    {
      "@id": "https://openfoodnetwork.org.au/api/dfc/typeIndex#3",
      "@type": "solid:TypeRegistration",
      "solid:forClass": "dfc-b:Catalog",
      "solid:instanceContainer": "https://openfoodnetwork.org.au/api/dfc/catalog/"
    }
  ]
}

At this point we know where to GET, POST, PUT, PATCH and DELETE data: we obtained the data endpoints of the platform.

Note that there are two type indexes: one of the platform used to discover the workspace (from the WebId); and one in the workspace to discover data endpoints.

mkllnk commented 6 months ago

Nice example, thank you!

My main question was aiming at knowing if there is a well known URL somewhere but it looks like the web id URL has to be stored somewhere to connect to that platform.

Within DFC, I guess that we could link straight to the type index of the workspace but the web id makes it compatible with other systems, right?

You mentioned the possibility of linking the web id to the id in OIDC. If we use a generic URL like https://openfoodnetwork.org.au/profile/card then it doesn't identify the person. We still need a unique id per account, right?

lecoqlibre commented 6 months ago

My main question was aiming at knowing if there is a well known URL somewhere but it looks like the web id URL has to be stored somewhere to connect to that platform.

Yes currently the WebIds of interconnected platforms have to be stored. If we have some kind of a well-known we would also have to store the hostname of interconnected platforms. So anyhow we have to store something and I thought that storing the WebId was OK. Do you and the others think we should rely on the hostname of the platform instead? If so, several ideas come to my mind:

Within DFC, I guess that we could link straight to the type index of the workspace but the web id makes it compatible with other systems, right?

Right. But then you need a discovery mechanism. That's why we are proposing WebIds which would make our discovery mechanism compatible with the Solid ecosystem. This is a first step on the road to Solid compatibility.

You mentioned the possibility of linking the web id to the id in OIDC. If we use a generic URL like https://openfoodnetwork.org.au/profile/card then it doesn't identify the person. We still need a unique id per account, right?

In Solid, WebIds are used to authenticate and authorize users. A same WebId of a user is used across all the applications he uses. A WebId of a user is defined at one place only.

This proposal does not propose WebIds for users of non-Solid DFC platforms. This is not needed for now as Solid-OIDC is not the authentication mechanism of the DFC standard. This will be required when the DFC will be ready to get closer to a Solid compliance. The DFC platforms would then be able to use Solid PODs/Storages! But for now Solid and non-Solid DFC platforms are not compatible (they will need a compatibility layer at the authentication level).

simonLouvet commented 6 months ago
https://openfoodnetwork.org.au/profile/card#it

I don't understand why you have to go through https://openfoodnetwork.org.au/privateTypeIndex to reach https://openfoodnetwork.org.au/api/dfc/typeIndex.

Why not reference https://openfoodnetwork.org.au/api/dfc/typeIndex in https://openfoodnetwork.org.au/profile/card#it?

lecoqlibre commented 6 months ago

Indeed, there are two type indexes:

I propose this distinction for several reasons:

Using the same discovery mechanism for Solid and non-Solid DFC platforms presents some advantages like:

Note: the private TypeIndex should be discovered through the preferences file and not directly from the WebId.

Here is a bit more details on DFC Solid apps: In a DFC Solid app, a user opens a workspace he want to work with. To do so he browses his storage(s) (or PODs) and select a DFC workspace (which is a LDP container). For instance, on one of his storages, he could have a workspace at /Documents/MyFarm/. When he opens it, the application can display the information contained in the workspace by following the links found in the TypeIndex of the workspace.

simonLouvet commented 5 months ago

After speaking with @maxime, I understood that https://openfoodnetwork.org.au/privateTypeIndex refers to a user's data whereas https://openfoodnetwork.org.au/api/dfc/index was referring to the platform's data.

I also understood that https://openfoodnetwork.org.au/profile/card#it was referenced in the https://openfoodnetwork.org.au/preferences resource because it is the solid specification.

I don't understand where https://openfoodnetwork.org.au/api/dfc/index is referenced from the webId. is it this part?

      "@type": "dfc:Workspace", 
      "dfc:hasTypeIndex": "https://platform.ex/dfc/typeIndex"

Overall, I agree with the specification proposed by maxime.

lecoqlibre commented 5 months ago

After speaking with @maxime, I understood that https://openfoodnetwork.org.au/privateTypeIndex refers to a user's data whereas https://openfoodnetwork.org.au/api/dfc/index was referring to the platform's data.

I would say it's the opposite :) The TypeIndex(s) of the platform, being public or private, must contain a link to a user's workspace. In a case of a non-Solid platform this workspace will likely be an endpoint which will provide data of the authenticated user.

I also understood that https://openfoodnetwork.org.au/profile/card#it was referenced in the https://openfoodnetwork.org.au/preferences resource because it is the solid specification.

I would say it's not only because of the solid specification. It's also a feature of the Semantic Web. The Web is made of documents (they could be stored in a file system or in a quadstore) and these documents can state different information about a thing. In the case of Solid preferences document, a new triple is asserted about the WebId and contain a link to private TypeIndex of this WebId owner (the platform). It can be called an "extended profile".

I don't understand where https://openfoodnetwork.org.au/api/dfc/index is referenced from the webId.

To discover the workspace endpoint of a non-Solid platform, one has to find a TypeIndex registration for the class dfc:Workspace in the entire profile (public profile + eventually the extended profile) of the platform. In the example above, the platform is using a private TypeIndex. So to find the workspace endpoint, the client will have to follow its nose to find the private TypeIndex and then the TypeIndex registration for the class dfc:Workspace.

simonLouvet commented 5 months ago

To discover the workspace endpoint of a non-Solid platform, one has to find a TypeIndex registration for the class dfc:Workspace in the entire profile (public profile + eventually the extended profile) of the platform. In the example above, the platform is using a private TypeIndex. So to find the workspace endpoint, the client will have to follow its nose to find the private TypeIndex and then the TypeIndex registration for the class dfc:Workspace.

there's always something I don't understand. https://openfoodnetwork.org.au/privateTypeIndex#1 mentions that it exists for the dfc:Workspace class and for the https://openfoodnetwork.org.au/api/dfc/ server, but how can it find the https://openfoodnetwork.org.au/api/dfc/index resource?

lecoqlibre commented 5 months ago

https://openfoodnetwork.org.au/privateTypeIndex#1 mentions that it exists for the dfc:Workspace class and for the https://openfoodnetwork.org.au/api/dfc/ server, but how can it find the https://openfoodnetwork.org.au/api/dfc/index resource?

The privateTypeIndex#1 currently references a ldp:Container. The index document is normalized, it's static like I previously said :

Note: we could have a direct link to the index document instead of a link to the Workspace container so the name of the resource could be anything. While this can be done, it is preferable to normalize the Workspace entry point. This will ensure the Workspace is self-sufficient as it won't depend on external semantics (so potential breaking changes could be limited).

This means that the index document is known by clients. Once they have found the workspace container they know they have to load the "index" document at the root of that container.

I'm open to discuss this point.

Maybe it would be better to reference directly a document within the container in the TypeIndex? This way the root document can be named anything. For instance the Solid chat client-to-client standard states that the root document should normally be "index":

Within that folder, the main channel is normally $ROOT/index#this

So I guess if the client does not find an index document it should search for an instance of a dfc:Workspace at the root of the container.

This seems to be more flexible and should be as much efficient as a fixed document for most situations. It should only be less efficient when the root document will be different than "index" (as a search operation would be needed).

I think at the end I would be in favor of using a dynamic root document which should normally be "index".