dcmi / lrmi

LRMI in RDF
https://www.dublincore.org/about/lrmi/
25 stars 11 forks source link

Disposition of useRightsUrl #5

Closed stuartasutton closed 4 years ago

stuartasutton commented 9 years ago

The useRightsUrl property was brought over to DCMI in the transition as part of the LRMI-approved version 1.1. [1] As we know, the useRightsUrl was not adopted by schema.org given the schema.org/license property (https://schema.org/license). As part of the RDF encoding of 1.1, the useRightsUrl property was included [2] and declared as a subproperty of the schema.org/license property.

At the time of adoption by schema.org/license property, there was an ongoing conversation on the LRMI GG of refining the lrmi:useRightsUrl with subproperties that would better meet the needs of certain use cases (get GG message permalinks). I know that Steve and Brandt were keen on refinement. That refinement conversation stalled in light of the schema.org/license adoption and the shift of LRMI focus to the transition of the specification's stewardship.

So, the question: Where do we stand on the lrmi:useRightsUrl status and refinements discussion?

[1] http://dublincore.org/dcx/lrmi-terms/1.1/ [2] http://dublincore.org/dcx/lrmi-terms/#useRightsUrl

science commented 9 years ago

Background

My input (this is Steve Midgley) on this one is to have two fields related to use Rights. One field is for describing the actual license attached to the resource. This should be "useRightsUrl" I think. The other is for a field which describes how you can access a resource via its license. For example, useRightsUrl isn't very useful in accomplishing that if it links to an "All Rights Reserved" page. One criticism raised in the past to me by a few publishers is that useRightsUrl has an inherent bias towards open licensed resources. Closed licensed resources can't communicate how users can access and use their resources using useRightsUrl because there are complicated restrictions that can't be codified in a universal license.

Use-case / Issue

This proposed alternative field (call it xyzUrl in this post), designed to address this issue, would indicate for example what class of users might have access to the resource. Or in which collection the resource belongs. In Brandt's use-case, the need is as follows: SBAC publishes a set of resources that are accessible to any teacher in any of the states who are members of SBAC. Each state (for example) is given an LTI key they can use to auth/n/z their users into the SBAC repository. SBAC also publish a set of resources which are freely accessible to anyone under CC-By-3. The question is how do we signal in the metadata that a state LOR should use a given LTI key to log a user into the SBAC repository, or even show the user that content (with the knowledge that the user could log in if they click on the resource)? The counter-purpose is to avoid having LORs show content to users that the users can't access because (in this example) they teach in a state which is not an SBAC member.

Proposed solution

The proposal to address this is to have a second field, xyzURL, that allows SBAC to attach access rights descriptions to resources. In this example it might be xyzURL: http://sbac.org/access/2015/member

Then when they distribute their LTI keys to the LORs (state, non profit and commercial entities) they can tell the LOR operators "only show resources with the xyzURL field matching http://sbac.org/access/2015/member to users in the following states, and only use this LTI key we're giving you for those authorized users."

Other user-cases

This is just one use-case but hopefully it demonstrates why we want a second field in addition to the license field. Other use-cases include for-profit publishers who wish to distribute metadata publicly but want to only have paying customers display those resources to their users (and they want to add new customers easily). The license field in this example would just be a link to the SBAC copyright page or something similar.

Naming of the field

There have been various proposals on the name of this field (xyzUrl). As I recall they include:

Personally, I don't care what the field is called so long as it is meaningful. I mildly prefer accessRightsUrl because it is shorter to type.

Re-use of a single field?

I think one question to surmount is whether the useRightsUrl field could be appropriated for this purpose (and so avoiding the need for a second field). IIRC, Phil Barker argued (on LRMI list) strongly and with evidence that this was a bad idea, and I'm inclined to agree with him.

stuartasutton commented 9 years ago

Just a brief note on "Re-use of a single field"--technically, we cannot "appropriate" (reuse) an exiting property where that means changing its meaning/semantics. It violates the core tenet of "persistence" on which trust and reliability rest (and violates the DCMI namespace policy):

"If ... such changes of meaning are likely to have substantial impact on either machine processing of DCMI terms or the functional semantics of the terms, then these changes will be reflected in a change of URI for the DCMI term or terms in question."

In other words, a URI that names a property or class can never be "reused" or "appropriated" through change of meaning because it places relying systems at risk of breaking. Substantive changes result in a NEW property or class. Properties/classes can be deprecated (no longer present for preferred use), but they are never deleted or semantics changed in harmful ways...never make legacy systems break.

In the LRMI RDF schema at http://dublincore.org/dcx/lrmi-terms/#useRightsUrl , we have declared lrmi:useRightsUrl as a subproperty of schema:license. If we were to create a property for defining access restructions (lrmi:accessRestrictionsUrl or lrmi:accessRightsUrl), it might also be declared a subclass of schema:license.

My only question is whether making this distinction between "use" and "access" means that existing systems that have used lrmi:useRightsUrl for BOTH purposes would then be "wrong" (or half wrong) because we have narrowed the meaning (and application) of the property? What is the risk and what are the consequences?

science commented 9 years ago

I don't think there is much risk of breaking existing applications which are overloading useRightsUrl - I'm not aware of anyone abusing/extending that field right now. It's that a few of us want to communicate this information and currently don't have a method.

One possible approach would be to give us a name and a provisional space and let us use it. If it turns up widely used (say in the Learning Registry at least), then it would be worth bringing all the way into the standard..

stuartasutton commented 9 years ago

Well, it seems to me like there are two options given a perceived need to do something:

  1. Deprecate useRightsUrl and create two new properties reflecting the identified needs; or
  2. Add a new property that satisfies the unfulfilled need and continue using the useRightUrl property as-is but with an implied narrowing of use.
stuartasutton commented 9 years ago

And another option would be to utilize the ontologically "soft" schema.org/supersededBy (https://schema.org/supersededBy) with the lrmi:useRightsUrl property instead of deprecation. Thus:

lrmi:useRightsUrl schema:supersededBy schema:license

Then, we will have: (1) preserved lrmi:useRightsUrl so we don't break any systems; (2) pointed lrmi:useRightsUrl to schema:license as the general licensing property by gently noting in documentation and in the LRMI schema that useRightsUrl has been superseded; and then (3) create two new LRMI subproperties to cover the more granular aspects identified by the use cases (i.e., the suggested lrmi:accessRestrictionsUrl & lrmi:accessRightsUrl). Sound reasonable?

I've suggested a new work package for next steps in the LRMI schema development (http://wiki.dublincore.org/index.php/AB-Comm/ed/LRMI/TG#wp4) where we might move a solution forward quickly.

stuartasutton commented 9 years ago

Previous message not that clear...sorry. In sum: (1) lrmi:useRightsURL superseded by the schema:license; then (2) two new LRMI subproperties of schema:license that cover both use case needs.

science commented 9 years ago

Great looking. This introduces a little confusion for me: I thought that accessRightsUrl == useRightsUrl == license. I wasn't aware there was a use case for any of those fields that wasn't satisfied by any other field? The idea, in my mind was as follows:

Those are the only use-cases I'm aware of - which suggests that "accessRightsUrl" is not necessary, and in my opinion just complicates understand what the differences between 3 rights fields are. Having just license and accessRestrictionsUrl would do everything I'm advocating for, I think, and be more self-documenting.. What is the use-case for accessRightsUrl? Other thoughts?

On Thu, Apr 9, 2015 at 6:46 AM, Stuart Sutton notifications@github.com wrote:

Previous message not that clear...sorry. In sum: (1) lrmi:useRightsURL superseded by the schema:license; then (2) two new LRMI subproperties of schema:license that cover both use case needs.

— Reply to this email directly or view it on GitHub https://github.com/stuartasutton/LRMI-Terms/issues/5#issuecomment-91236506 .

science commented 9 years ago

Per our conversation, I'm proposing a new solution.

license: url

collection: CreativeWork (and primarily url)

One use-case for collection is to permit organizations to communicate the existence of a closed collection of resources. Consumer/search organizations who have access keys (such as LTI) to the collection, can determine which of their users are entitled to the keys, and display the collection in search only for users who are capable of viewing the content.

stuartasutton commented 9 years ago

Steve, this is timely. There has been conversation and work on a class (type) for Collection in the Bib extend community [1] & [2] with submission to schema.org with a "target date for consideration" of last Friday (April 24th).

So my immediate take away from our skype conversation and your post here, Steve, is that the use case matters driving this conversation are largely around non-public licenses enabling access to 'collections' of learning resources--e.g., through mechanisms such as LTI, licensing keys etc. that, for example, a school district or some other entity might deploy. So, the license of concern hangs off a collection of LRs:

LR===collection===>COL (LR belongs to a 'collection') COL===license===> "some license" [text/URI](COL has a non-public license)

Am I on target, Steve, with the use case? Others folks? Brandt?

If so, this doesn't appear to me to be an immediate matter of LRMI coining some new license property if what the library community is proposing for schema.org with a Collection entity and the existing schema.or/license suffices.

[1] http://www.w3.org/community/schemabibex/wiki/Bib.schema.org-1.0 [2] http://sdo-bib.appspot.com/Collection?ext=bib

stuartasutton commented 9 years ago

AND, if this is correct, the matter for us becomes 'best practice" documentation on how a learning resource description can be tied to a collection to which the resource belongs.

science commented 9 years ago

Well very interesting! I was worried that schema.org would have a long process for adopting our suggestion of a collection as a CreativeWork array (or similar) but it looks like they have already done it!

In which case, yes, our approach should be to include Collections in LRMI somewhere as "guidance" or best practice for how to deal with things.

As far as I'm concerned, my use case can be solved with Schema "licence" and "collections" - and not additional fields. So deprecate useRightsUrl or whatever you choose. My approach will be to steer the community I'm working with to use collections to solve the problem we have.

And your depiction of how an LO connects to a collection and the collection connects to an access key like LTI is exactly right. (And how a collection can have a license itself, which can be used to signal whether the collection requires LTI or whatever or is itself open).

I know that IMS, for example, is providing GPL licensed material to its members, but through a closed portal. So the resources are open, but the collection is closed! Even that use case can be handled by this proposed approach we're discussing.

I'd say "lock it in" and move onto other problems. Let me know what I can do to help document this (or what documentation/best practices you want in LRMI itself).

science commented 9 years ago

Looping in @bredd to the discussion so he can take a look at the state of the state.. Thoughts @bredd?

stuartasutton commented 9 years ago

Steve, I think schema.org/Collection is not yet in schema.org but is inevitable. I _think_ the current status is that the Bib community has proposed it. I also notice that there is a revision in a name-change pull request from @vholland to rename the schema.org/collection property to targetCollection with type of ItemList, Thing. It still has a domainIncludes of schema.org/UpdateAction, but that should not preclude use with CreativeWorks.

bredd commented 9 years ago

Thanks to Steve (science) for drawing my attention to this thread.

I'll tackle the easy parts first:

Thus, open licenses (like CC-BY) would be indicated using schema:license . Restrictive licenses could also be indicated this way but they would need to be augmented with information on how to gain access.

Now for the harder part, indicating how to gain access: If I understand the thread correctly there are two propositions on the table.

Proposition 1

First is what has been variously called xyzURL, accessRestrictionsURL, useRestrictionsURL, etc. The value of this field is a lookup key that the subscribing application uses to determine whether the user has access to the resource (presumably by way of an encryption key). This is the method I've been advocating. Only, I don't like any of the proposed names since none of them adequately convey the idea of using the value to look up an access mechanism in behalf of the user.

Let's start with a clear definition of the field. Here's a draft: "The identifier of an access-controlled collection that may be licensed to a user or organization. Applications can use this identifier to look up the mechanism (including protocol and keys) needed to gain access to the content."

That leads me to nominate accessManagedCollectionURL, accessManagementURL, or some variation on this theme.

Proposition 2

This is not too dissimilar to Proposition #1 once you traverse the abstractions. Essentially you use the emerging collection attribute to indicate that the item belongs to a collection. Then, certain yet-to-be-determined properties on the collection indicate how to gain access to the collection.

The challenge with #2 as compared to #1 is maintaining a way for basic applications to determine which items have access restrictions. In Proposition #1, the mere presence of xyzURL is sufficient to detect that the item is access-controlled. In proposition #2, the collection attribute has multiple purposes. So, an application either has to know something about the nature of the collection or it has to parse the license field sufficiently to know the difference between an open and a restricted license.

Is this all making sense?

science commented 9 years ago

It is. I agree that the collection needs additional metadata in order to indicate whether it is a closed or open collection and if closed, what mechanisms can be used to gain access. As @stuartasutton pointed out to me, collection itself is a CreativeWork, so you get lots of tools to indicate through community profile/practice on how to make this work. A collection can have a license, keywords and other metadata in it. For example

collection:
   url: http://mycompany.com/collections/collectionabc
   license: http://mycompany.com/license/collectionabc
   keywords: [lti-enabled]

This is obviously overloading keywords to give clues, but the key point is that you wouldn't be relying on the license of the resource, you'd be relying on the license from the collection itself to tell you about how to access the resource. In some ways, we've been advocating for this for awhile: "If the license for the collection is not one you recognize as open, then assume it's closed. If you recognize the license in the collection as one for which you have an associated LTI key, then assume you can use your LTI key to activate authorized users to gain access."

Feels like it would work with no functional harm vs our original proposal, and doesn't require any new fields, except supporting the existing proposal for collections. Thoughts?

bredd commented 9 years ago

My only concern is the one I expressed before. There are many reasons why a work might belong to a collection -- only one of which is access management. In this scenario, you don't know whether access is restricted without reading the metadata of every collection to which an work belongs. Meanwhile, the mere presence of the accessManagedCollectionURL (or whatever it's named) indicates that an item is access-controlled.

science commented 9 years ago

Help me game that out a bit - I'm not clear that it's a real distinction. Let's take the same record (an sbac resource that is restricted to only state members) and render it in the two ways. Let's also take a khan academy video that is part of an open collection to compare it to.

# SBAC managed collection
url: http://sbac.org/1
license: http://sbac.org/license/state-only-resources
accessManagedCollectionURL: http://sbac.org/collection/2015
# SBAC generic collection
url: http://sbac.org/1
license: http://sbac.org/license/state-only-resources
collection: 
   url: http://sbac.org/collection/2015
   license: http://sbac.org/license/collection-2015
# Khan open license collection
url: khanacademy.org/1
license: http://creativecommons.org/cc-by-nc-3
collection: 
   url: http://khanacademy.org/collection/grade-3-math
   license: http://creativecommons.org/cc-by-nc-3

My thinking is that "If you consume a resource and you don't recognize the license of either the resource or a collection to which it belongs, then don't display the resource to your users."

So it's a whitelist model: you whitelist a set of licenses that you know how to handle (they are either open licenses, or you know that you have the access protocol to connect your users into it). Any license not on your list is rejected. You could supplement your metadata by crawling the resource url page and looking license info there that you recognize (to handle cases of orgs who don't mark up their content correctly with license/collection data).

I think the accessManagedCollectionUrl approach is actually very similar: As a metadata consumer, you have to maintain a whitelist of managed collections that you know how to access. If you see a URL you don't recognize, you reject it. The difference is that in the collection system you also have to maintain a list of open license URLs that you also accept. This seems reasonable to me, since different orgs are going to allow different types of open resources onto their site, as well as closed collections (NC being the big one that some orgs can't show - so it's effectively a managed collection as well).

Thoughts?

bredd commented 9 years ago

I think we're getting to the nexus of the debate and I'm willing to be persuaded so long as we're clear about the implications of the decision.

When an application encounters an unrecognized license, it has two options:

1. Presume that access requires negotiation. This is what Steve (science) proposes above. The risk of this is that an application might deny access to content that was intended to be permissive. For example, suppose that Creative Commons updates to CC-BY 5.0. Content is posted that is available under this (permissive) license. But the application has not yet been updated to recognize this license (only knows about previous version of CC-BY) so it denies access unnecessarily.

2. Presume that access is permitted. This risks presenting content listings to the user that they are unable to access -- leading to frustration.

To me, neither of these options is satisfactory. To remedy this, the application needs a clue as to whether the license is permissive or not. Here are a few possibilities:

I'm sure there are other ways to flag this. The important thing to me is that the information should be included in the metadata of the item (or its collection) so that the application doesn't have to go searching through multiple indirections.

science commented 9 years ago

Concept/Proposal:

A) If the resource is tagged with license or collection metadata, assume the resource is closed, unless you recognize the license as being open. This sets the incentive on the community to tag things as open or they won't show up, which seems right. It also makes the resources more usable since the user shouldn't see closed resources that they don't have access to.

B) I would also propose that if there is no metadata at all about license or collection, then the resource should be assumed to be "open." This seems like a good way to "split the baby." The web is open by default and poorly marked up, so if you see metadata that doesn't have license data assume it is web accessible (aka minimally open).

philbarker commented 9 years ago

hello all. I've just had a read through this thread, trying a little to play catch up, so first I would like to summarise a couple of points to check my understanding.

  1. it seems agreed that the disposition of useRightsUrl is that it is supersedeBy license. I think that is the right thing to do. I think that redefining or even refining the meaning useRightsUrl would be problematic.
  2. there is some interest in collection-level metadata, the latest news on that from schema.org is that it wasn't included in the 2.0 update released this week, but will be included in the first extensions to schema.org 2.0, when a few vocab issues have been ironed out, when they are ready to make an announcement of new extensions.

What remains are some really quite gnarly questions relating to access permissions. To address the whole thing probably means creating an entire extension around access rights management. I don't get the sense that anyone here wants to go too deeply into that, so there are some simplifications being made. That's all right so long as they are known to be simplifications and not assumed to address the more general issues. The learning technology standards community did flirt with rights expression and management about ten years ago, and a look back at some of the work done then might help clarify what is being simplified and whether the simplification is sufficient. One of the bigger simplifications in the discussion is to talk about "access" rather than all the different actions that may or may not be permitted.

Steve's concept/proposal seems reasonable. I would personally rephrase case B to stress that you may assume the resource is assumed to be viewable but not open in any broader sense.

I also quite like the idea of specifying an accessManagementUrl but maybe it is not needed. If it is I think there is more work to be done to show how it works with a number of different access control mechanisms (low level specs like Oauth & shibbolith or at a higher level IMS-LTI or at an implementation level, people like http://www.ukfederation.org.uk/ and SBAC. If accessManagementUrl is taken forward, there needs to be consideration of how to make sure that people don't use it to point to something saying that the resource is open access, that would just be an unwelcome complication.

stuartasutton commented 9 years ago

While certainly not directly on-point, schema.org does have the property isAccessibeForFree https://schema.org/isAcccessibleForFree (boolean) which could explicitly eliminate certain "is it" or "isn't it" questions.

On Thu, May 14, 2015 at 8:27 PM, Steve Midgley notifications@github.com wrote:

Concept/Proposal:

A) If the resource is tagged with license or collection metadata, assume the resource is closed, unless you recognize the license as being open. This sets the incentive on the community to tag things as open or they won't show up, which seems right. It also makes the resources more usable since the user shouldn't see closed resources that they don't have access to.

B) I would also propose that if there is no metadata at all about license or collection, then the resource should be assumed to be "open." This seems like a good way to "split the baby." The web is open by default and poorly marked up, so if you see metadata that doesn't have license data assume it is web accessible (aka minimally open).

— Reply to this email directly or view it on GitHub https://github.com/stuartasutton/LRMI-Terms/issues/5#issuecomment-102242999 .

bredd commented 9 years ago

Phil's comment has me thinking about the access protocol discussions that Steve Midgley and I have had. Indeed, we have a concept of a protocol for negotiating access and we are dramatically simplifying the information needed in order to minimize the impact on the metadata specs. For the sake of this record, I think it's worth explicitly stating what Steve and I have been assuming.

The basic idea is that the system providing the content (provider) and the system accessing (accesser) the content must negotiate access in a reasonably secure way. IMS LTI 1.0 offers a relatively broadly established protocol for doing so -- assuming that the provider and the accesser are both aware of the access controls and that the accesser has an encryption key that is recognized by the provider.

The minimum metadata needed to facilitate access negotiation is an ID of a collection. The idea is that the accesser has a catalog of collections to which it has permission. That catalog is indexed by a collection ID. When the accesser encounters a collection ID that it recognizes, then it retrieves the protocol and encryption key information from its catalog and uses that to negotiate access to the content.

It's with these assumptions in mind that we have reduced the LRMI/metadata requirement to a collection ID - in the form of a URL (actually a URI in this usage).

science commented 9 years ago

@stuartasutton - I was redirected to this thread today from questions from interested parties (some people I've been proxying with my input).

isAccessibleForFree does not solve the key problem they have which is to detect login-walls. That's something intermediate between "free" and "paid".

license doesn't solve this either: you can put CC licensed resources (that you own) behind a login wall (or a paywall for that matter - IMS does this with their GPL licensed tech for example).

I'm not sure we have a method for letting people indicate login-required in any meaningful way. That said, for groups that require login, they won't care to indicated login-required..

stuartasutton commented 9 years ago

@bredd, today there was finally a merge of schema.org/Collection by @danbri (type of CreativeWork) defined as "A created collection of Creative Works or other artefacts" https://github.com/schemaorg/schemaorg/commit/287a764dbf5bac18872de5792529fa837599b5c7. So we now have a Collection entity.

danbri commented 9 years ago

Let me clarify. It is merged into our working tree as part of the still under-development bibliographic extension. One of the steps that it still needs to go through is getting general agreement on the terms used. It is possible that consensus will be that a more specific term is appropriate (eg. CreativeWorkCollection).

philbarker commented 4 years ago

I'm going to mark this as closed. I think the original issue was dealt with. The ensuing conversation remains on record and if anyone still wants to distil and actionable issue out of that conversation we should start a new issue for that. BTW, since this conversation, https://schema.org/usageInfo has been added to schema.org, which seems relevant though not exactly what we were talking about.