Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Ability to distinguish different types of content and content that belongs to different institutions #478

Open uconnjeustis opened 7 years ago

uconnjeustis commented 7 years ago

This use case is slightly out of scope from Islandora-CLAW/CLAW#396. This use case was moved to this new issue.

On the Islandora Metadata Interest Group, a discussion was started on OAI-PMH support. In addition to some wanted features, the idea of namespaces came up. Our use case is different from that of @rosiel and wanted to add it here.

Use Type Description
Title (Goal) Ability to distinguish and/or assign content to multiple institutions
Primary Actor Sysadmin, Repository Admin, Repository curators
Scope Islandora Site Architecture
Level Medium?
Story Currently, the Connecticut Digital Archive works with over 40 institutions who add and manage content in the repository and in multiple sites. To distinguish one institutions' content from another, CTDA implements namespaces. Each institution has a namespace that is a range. For example, 20002-29999 is the namespace range for UConn Archives & Special Collections. The reason for this is that UConn ASC can have general content in the 20002 namespace, research data in 20003, and university records in 20004. Each institution has such a range where the first one or two numbers never change. We not only use namespaces to distinguish content from different institutions and within an institution different types of content but also namespaces are used on various sites. For example, we have a site for UConn ASC and CT State Library. For CTDA, we really need an easy way to ensure that institutions and users can quickly determine if the content is theirs. Namespaces allow us to do that especially as they appear in the PID, in the url, etc. Going forward we need a way to ensure these institutional distinctions remain in place and can be continued in such a way that non-technical volunteers are easily able to assign content to a particular institution.
ajs6f commented 7 years ago

Is this addressable via LDP containment? If so, that would be the most natural idiom.

uconnjeustis commented 7 years ago

We use namespaces to do this currently. What makes this easy is that we can identify in the PID quickly which institution is the content owner. For any sparql or SOLR queries, it's possible to filter by namespace which is great. Recently, I searched for OBJ size by namespace. This was to determine how much institutions had uploaded (of course only in terms of the lastest OBJ datastream size). Also, the use of namespaces is convenient when sorting through harvest results. With those results, you can sort by namespace and produce nice reports for institutions that want an inventory of their metadata at the end of the year.

ajs6f commented 7 years ago

This may not be a good solution for the long-term or large scale. It's better to use opaque identifiers. Wouldn't your needs be met by using a property the values of which would partition your repository by institution?

uconnjeustis commented 7 years ago

@ajs6f What do you mean by property the values?

ajs6f commented 7 years ago

...a property, the values of which.... (The values of that property) would partition...

acoburn commented 7 years ago

I believe that what is meant is something like this -- resources each use opaque identifiers (a very good idea) but then have a property that points to the institution managing that resource (there may be a more appropriate property, but this is an example):

</ae5e022f87f74c9a717> dcterms:isPartOf <info:repository/uconn> .

and:

</c03825dc32fab94c439ca> dcterms:isPartOf <info:repository/amherst> .

Or, simply via LDP containment:

</uconn/ae5e022f87f74c9a717>
</amherst/c03825dc32fab94c439ca>
DiegoPino commented 7 years ago

@acoburn i kinda like the idea of <> dcterms:isPartOf <info:repository/amherst> . wich really could be any object property depending on each use case but could be done also as simple as a data property (a.k.a a string) like flagging or tagging your resources. <> someont:inspaceof "amherst" right? For another use case: What about using WebAC for the same purpose? AuthZ based alternative in addition to an extra property, making good use of agents and groups you get the accessible/not accessible results and also making use of "automagic" filtering by fedora4. Currently, in Islandora, namespaces are also used to exclude resources. My 2 cents

ajs6f commented 7 years ago

Yeah, this is the idea (I'm not totally sure that dcterms:isPartOf is the best choice here, but that's not important). Either a property or LDP containment. The advantage of the LDP containment is that it is connected with authZ via WebAC. Then there isn't much need for a property.

uconnjeustis commented 7 years ago

Thanks for the clarification @ajs6f and @acoburn. I could be way off on this but it seems that the LDP containment might be a better way to go. Or is it better to have this information in more than one place? I'm not sure someone would want to duplicate this information but just thought I ask the question anyway.

ajs6f commented 7 years ago

I'd be inclined to LDP containment. We haven't worked out a complete scheme for multitenancy from the Fedora side, but I don't think there is much question that it will pivot on LDP.

ajs6f commented 7 years ago

@ruebot Do you want to take this up on a CLAW call or too early?

acoburn commented 7 years ago

+1 on using LDP containment. That will also be much easier to make work with WebAC.

dannylamb commented 7 years ago

@ajs6f We're going to have to figure out some basic multitenancy scheme for fedora at some point. Both @rosiel's and this use case imply it.

And containment feels like a pretty natural way to attempt this.

And anyone can feel free to slap this onto the CLAW call agenda if they'd like. I think it'll eventually bring us to the conundrum we've had around translating 'islandora:root' from Fedora 3 to 4, and how it would work with multisites and multitenancy.

ruebot commented 7 years ago

@ajs6f what @dannylamb said:

anyone can feel free to slap this onto the CLAW call agenda if they'd like

rosiel commented 7 years ago

The main difference that I see between a property vs. LDP containment is that you can have

</ae5e022f87f74c9a717> dcterms:isPartOf <info:repository/uconn> .
</ae5e022f87f74c9a717> dcterms:isPartOf <info:repository/amherst> . 

But you probably can't have

</uconn/ae5e022f87f74c9a717>
</amherst/ae5e022f87f74c9a717>

LDP containment would therefore be more similar to the existing namespace method, and if it integrates with WebAC, all the better.

@dannylamb in this context what do you mean by "multitenancy"?

uconnjeustis commented 7 years ago

The LDP does seem to be similar to the namespace method as it can distinguish content by institutions. Please forgive my ignorance... but to further distinguish different types of content within an institution, would it be possible to have something like...

</uconn/general/ae5e022f87f74c9a717> </uconn/researchdata/ae5e022f87f74c9a717> </uconn/univrecords/ae5e022f87f74c9a717> </barnum/general/ae5e022f87f74c9a717>

ajs6f commented 7 years ago

@uconnjeustis Yes, absolutely. That's just the sort of thing for which LDP is meant to be used.

@rosiel Careful-- you're wrong in that certainly can have resources in more than one container (via Direct and Indirect container action), but you're right that they can't have more than one URI in a given API instance. It's a bit confusing that way.

ajs6f commented 7 years ago

@uconnjeustis One point to consider-- if you want to make the best use of LDP for that kind of problem, try to stick to classifications/categorizations that have a partitioning quality; i.e. for which each resource belongs to one and only container. You can do more complicated things, certainly, but you start to slip towards a point of complexity for which you would do better to use a multivalued property. And consider the interaction of the various systems of categories. It's a design choice for which we need to look at a specific use case to make an informed decision.

For example, let's say that your various resources are never owned by more than one institution. Then using LDP to put them in containers-by-institution is a great idea:

</uconn/researchdata/ae5e022f87f74c9a717>
</uconn/projectX/ae5e022f87f74c9a717>
</uconn/projectY/ae5e022f87f74c9a717>
</barnum/researchdata/ae5e022f87f74c9a717>

But let's say that you want to be able to search across all research data at once, and that data in some specific project is also considered research data for the purposes of that search . Then you might do better to put that information (a type of resource) into a property, like a literal or an rdf:type.

Fedora 4 offers you a much more flexible and powerful set of techniques and possible practices for data modeling, but data modeling is still work and it's still the heart of what it is to "do Fedora", so look forward to it!

dannylamb commented 7 years ago

@rosiel I'm using 'multitenancy' to describe when more than one group/organzation is sharing a single Fedora.

dannylamb commented 6 years ago

Linking to https://github.com/Islandora-CLAW/CLAW/issues/926