duraspace / pcdm

Portland Common Data Model
http://pcdm.org/models
Apache License 2.0
90 stars 11 forks source link

Embargo predicates #70

Open bcail opened 7 years ago

bcail commented 7 years ago

Should embargo/lease predicates be added to PCDM? See initial discussion at https://groups.google.com/forum/#!topic/pcdm/d1W6IeqbLfM. If embargo predicates are added, it might be useful to add examples to the wiki for how to use the rights extension and the new embargo predicates.

escowles commented 7 years ago

I think this would be a good thing to add as a PCDM extension, either adding to the existing Rights vocab (http://pcdm.org/rights#) or as a separate Access extension vocab.

bryjbrown commented 7 years ago

Based off my experience with Islandora 1.x's embargo system and dealing with FSU's Graduate School in handling embargo requirements/management for ETDs, these are the details of embargoes that end up mattering the most:

Object vs File

In Fedora 3 this was implemented on a PID vs datastream level, but in F4/PCDM this would be phrased in terms of embargoing the parent object vs a child file (or perhaps fileset). The thing that is embargoed is inaccessible to non-admin users, so an embargoed object would not show up in search/browse interfaces at all, it would be effectively invisible to users. An embargoed file would have the parent object show up in browse/search interfaces, but the child file would not (also consider that the parent object's display might also have a display message about one of its child files being embargoed).

It seems like the best practice in the ScholComm/IR world is to use file embargoes so that users can see that an object exists, they just can't read/download it. There aren't many situations where an object embargo is useful, but they def. do come up. Perhaps an ETD contains offensive content or falsified data, so you don't want it to be accessible yet you are still obligated to preserve it.

IP ranges

An embargo should be configurable to be either global (applies to everyone) or ranged (users on a specified IP range bypass the embargo). Ranged embargoes occur most frequently for ETDs on our campus, but some non-IR objects end up getting ranged embargoes due to licensing restrictions that say it must only be available on-campus. These ranged embargoes can be bypassed off campus as well using something like EZProxy, but it still requires users to log in with a university ID, so these ranged embargoes also end up doubling as a way to restrict any users who aren't affiliated with your university.

Expiration

An embargo should be configurable to either automatically lift at a specified date, or to stay active until manually lifted by an admin. In Islandora 1.x, expiration dates can either be a W3CDTF or the string "indefinite" but there's probably a better way to handle this.

"Stacking"

There should be a way to "stack" embargoes on an object/file so that one embargo policy can expire into another. For instance, its a very common practice here at FSU for our Grad School to say that an ETD should have its PDF globally embargoed for 2 years, and then the PDF should be on-campus access only indefinitely after that. We implement this by placing 2 different embargoes on it (the global inaccessibility overrides the on-campus access while the first embargo is active), and when the global embargo expires, it leaves the on-campus embargo active. Wearing my linked data hat, I'm inclined to believe that having two separate embargoes on an object/file seems semantically valid, and that the application should be able to apply them simultaneously with the more restrictive policy elements overriding the more permissive ones.

bcail commented 7 years ago

@duraspace/pdcm-committers - any more comments before we work on a PR?

ruebot commented 7 years ago

I think it would be good if somebody can investigate any overlap here with use cases.

bcail commented 7 years ago

Thanks, @ruebot. I do see "embargo" there: http://www.sparontologies.net/ontologies/pso/source.html#d4e645. Maybe it would better to use that ontology rather than add something to PCDM? If so, it'd be nice to list it as the preferred ontology for embargoes.

ruebot commented 7 years ago

@bcail exactly! It would be great if a couple people would be willing to do a survey of available ontologies, identify possible predicates that would be used and create recommendations, and identify where the gaps are, which should inform a path forward for extending PCDM.

bcail commented 7 years ago

I've started a document with some different ontologies: https://docs.google.com/document/d/1v8ZOUNM679iOjjRiIFHqQQykhPMyEXJ8MHlFJEVIw2s/edit?usp=sharing. Anyone is welcome to contribute.

bryjbrown commented 7 years ago

@bcali That doc is great, lots of good info there. It looks like there are many ways to say that something is embargoed, but not a lot of options for expressing how it is embargoed (policy details) that an application could make use of. For instance it looks like pso:embargoed acts as a value for pso:PublicationStatus, but nothing about when the embargo will expire. Hydra ACL looks like it comes the closest to meeting my personal requirements, but still not quite granular enough.

bryjbrown commented 7 years ago

If an embargo policy could be a resource itself, or more specifically an rdfs:Class like pcdm:Object/File/Collection, then a pcdm:Object or pcdm:File could link to a pcdm:Embargo URI (which would take care of the object vs. file feature mentioned above) and that embargo URI could have attributes attached like an expiration date or IP range, as well as more potential attributes that may be need like a display message or contact email.

escowles commented 7 years ago

I'm skeptical of making embargoes their own resources. That would be useful for handling multiple embargoes, but in most of the cases of multiple embargoes that I've heard (including the stacking comment above), I think one is an embargo, but the permanent campus-only status is the base rights/access status, not an embargo.

I would be more in favor of articulating the properties that apply to the embargo and seeing if we can attach them directly to the primary resource.

bcail commented 7 years ago

This issue is on the agenda for the January PCDM call (thanks Bryan).

bcail commented 7 years ago

@bryjbrown regarding the PSO ontology, it does have information about when an embargo will expire. You can use pso:holdsStatusInTime to specify various statuses the item will have (one of which can be pso:embargoed), and specify the time interval for that status, as well as an event that moves an item from one status to another. See https://semanticpublishing.wordpress.com/2013/03/01/lld5-using-spar-ontologies/ - search for "holdsStatusInTime".

bryjbrown commented 7 years ago

@bcail Interesting, here's the diagram in that page: pso:holdsStatusInTime

It looks like a document has the predicate pso:holdsStatusInTime with the object pso:StatusInTime, and that pso:StatusInTime can have lots of properties attached, such as ti:hasIntervalStartDate and ti:hasIntervalEndDate. If we could find a predicate that roughly translates to "is visible to this IP range" that takes an IP range (or something else to denote "all" or "none"), it could be attached to the pso:StatusInTime to show that this status comes with extra restrictions. This would solve the "stacking" problem I mentioned earlier as well by being able to attach different IP restrictions to different statuses.

What ontologies could we look at for specifying IP range accessibility?

DiegoPino commented 7 years ago

@bryjbrown looks to me that IP range /IP based embargo could be more an application or implementation concern/or even AuthZ via WebAC than something you would like to keep with your RDF properties, at least if thinking about doing it at the document level. Maybe use some type of ACL (webAC) tied to a specific IP range dealing app? Since even if you put IP's in the RDF itself, using, some kind of predicates, you still need to handle the restrictions (RW) via logic. And what if you need to change ranges or make it modal or open to a specific new network? Just my 2Cents

DiegoPino commented 7 years ago

@bryjbrown also: http://www.openlinksw.com/ontology/restrictions and some example with IP restriction predicates oplacl:IPAddressConditionetc with examples coming from virtuoso https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VAL_HttpRestrictions that could help with dealing with restriction definition. This is meant to define network - agent restriction, so i would assume some inter-linking through maybe user/groups or even organic groups to do so (my eyes on the pso:Agent probably)

bryjbrown commented 7 years ago

@DiegoPino These are great suggestions. My original thinking with keeping the IP ranges in the RDF is that if the object is moved or copied to another repository (migration, aggregation, etc) these access restrictions go with it instead of being hidden away in the application layer. Islandora 1.x is a good example of this, when IR objects are migrated from Islandora 1.x to CLAW, we'll have to pull the IP embargo information out of the database. Keeping all of the restriction data in the graph would mean that other repositories could respect the embargo properties.

bcail commented 7 years ago

Here's our current use case: we have some ETDs that are embargoed, so they're only accessible to the Brown community. The embargoes can be extended upon request from the student. After the embargo expires, the dissertation should be publicly accessible.

We're thinking about using the following properties for storing this information: pso:withStatus pso:embargoed fabio:hasEmbargoDate "2018-11-27T00:00:01Z" (is there a way to specify that the literal is a datetime?) (we can add more hasEmbargoDate properties if the embargo gets extended)

In the future, we may look at adding support for a more complex series of access restriction statuses, something like the following for a dataset: -access restricted to only the researcher -limited access to researcher and a small group of other individuals -access to the whole Brown community -public access

Seems like this could be at least partially described by the PSO ontology, with its series of statuses over time, although I'm not sure how to describe which people have access for the different statuses.

DiegoPino commented 7 years ago

@bryjbrown i understand the idea of moving data with their restrictions, my only suggestions would be to have restrictions definition tied to a different resource (not in the "document a.k.a publication") and linked via intermediate resource(a.ka agent, group, etc) so you can really keep functional semantics from descriptive separated, updates are simpler(just one resource instead of 1000x) and also make use of the benefits of reuse in a graph or even inheritance. Still thinking about something like webAC provides but for restrictions in scholar env.

anarchivist commented 7 years ago

@DiegoPino @bryjbrown I don't want to lose the thread about embargo modeling, but it seems like the topics around IP-based authentication/authorization are probably worth a larger discussion. Do you think we should open a new issue to talk about this? This is an unresolved topic in the Hydra-in-a-Box modeling work (see hybox/models#52 and the linked issues), but I think having some broader conversations about whether this is an application engineering or modeling concern would be worthwhile.

(cc @mjgiarlo)

EDIT: I realize this is not necessarily a PCDM concern, nor specifically even a Fedora/LDP concern. Some assistance in how to sharing/framing a set of questions with the W3C's ReadWriteWeb CG would be useful, because I'm still struggling with the uncertain status of WebAccessControl and the supporting BasicAccessControl Ontology as a "standard" given the existence of widely differing implementations and extensions.

jcoyne commented 7 years ago

Please consider the case where you switch IPs. Do you really want to update all your objects? I think you should make this an AuthZ concern. Being in a specific IP address range makes you a specific class (group) of user. The use case is members of a certain group (e.g. The people located at Brown University) may view this object.

tdonohue commented 7 years ago

FWIW, DSpace implements IP authorization exactly how @jcoyne describes. In DSpace, one or more IP addresses (or ranges) can be "mapped" to a Group (e.g. "On Campus Users"). The Group is then given AuthZ rights. DSpace uses these same Groups to define embargo rights (e.g. an object is limited to view by "On Campus Users" until a specific date). So, I agree with @jcoyne here.

bcail commented 7 years ago

So, regarding a vocabulary for defining embargo information, is the consensus for PCDM to recommend using the PSO ontology? If so, is there a place to make that recommendation? Somewhere on the PCDM wiki?

DiegoPino commented 7 years ago

@bcail i would guess by making some valid use case graphs + diagram + some explanations like the ones here https://github.com/duraspace/pcdm/wiki/Diagrams-with-rdfpuml and sharing them in the wiki would be a good idea? Maybe it a place where "integration with other ontologies" could fit, since i'm pretty sure that is a common use case and also will be an ever-growing one.

bcail commented 7 years ago

I'll work on some content to be added to the wiki, and then people will be able to comment on it and update it.

bcail commented 7 years ago

OK, I've added a short blurb with a few examples to the following document: https://docs.google.com/document/d/1v8ZOUNM679iOjjRiIFHqQQykhPMyEXJ8MHlFJEVIw2s/edit. Please take a look and suggest any changes/additions/corrections/... If it gets to the point where people are happy with it, hopefully we can add it to the PCDM wiki.

bcail commented 7 years ago

Are there any more suggestions/changes for the blurb I created? (Thanks to @bryjbrown for taking a look.) If there are no more changes to the text, I'll plan to put it on the wiki tomorrow.

DiegoPino commented 7 years ago

@bcail sorry have been lazy on this. I will read you document today. Thanks for your work on this

bcail commented 7 years ago

I've added a new page to the wiki: https://github.com/duraspace/pcdm/wiki/Embargo-Vocabulary-Recommendation. This issue can be closed, as far as I'm concerned.