duraspace / pcdm

Portland Common Data Model
http://pcdm.org/models
Apache License 2.0
90 stars 11 forks source link

Complex digital objects in PCDM #67

Open simosacchi opened 7 years ago

simosacchi commented 7 years ago

What is the expected approach for modelling resources that require multiple files to be properly represented as an (intellectual) object as intended by their creator?

I am thinking of the following possible use case, from the simplest to the most complex (mostly take from a "research output" perspective):

…and I think I am barely scratching the surface here…

I am asking because in our institutional repository we are expecting more and more research outputs (especially from DH and data science folks) to take the shape of complex entities. The FileSet construct seems not to me intended to group files that are different from master + derivatives, so my question is: how do we model complex digital objects in PCDM in a way that would preserve their structure and, when possible, supports visualization, without overly modelling each individual component? (I know, tarballing is always an option, but with so many drawbacks...)

DiegoPino commented 7 years ago

@simosacchi interesting question:

At least from my perspective, if a particular file, or a set of related files can live under a common medata description (this metadata description exists in a parent, top-level for this aggregation object), then a <pcdm:Object ><pcdm:hasFile><pcdm:File> (pcdm object with multiple binary Childs) would be adequate. So it depends on how we define that metadata and how badly we need to be able to access some of those files as an single object under a single entry page. e.g, use a few of them under a single concept in a Data Paper. Seems to depend more on other rdf:types than the ones defined in PCDM itself, but that is my impression.

Multiple of this pcdm:Objects would then form a more wider (semantically speaking) aggregation or intellectual representation, which could be defined as a "web page, scientific research project, software code etc" using additional non-pcdm Ontologies.

I don't see that Master -> derivative scenario as an imposed restriction for files under any pcdm:Object "like" parent, even on the current discussion about pcdm:FileSet. Mostly because i can't see how PCDM could impose such restriction that is clearly out of it's semantic scope.

My way of defining this can be a bit simplistic but is contained in the general question: Does a particular group of files can be described (non tech metadata of course) using a single parent metadata resource? If yes, all belong together under a single pcdm:Object. And the aggregation (or other semantic relationships) of multiple of this pcdm:Objects would then form a more advanced intellectual Object. If those pcdm:Objects (the first ones we build with directly attached files) could be used in multiple contexts, i would go for top level objects and then use proxy's to link to them. If they are effectively "contained" or "define via their existence their" parent resources (strong semantically connected) then i would go for direct aggregation.

Most of the Research Ontologies i have seen and the ones we used for Biodiversity and biological scientific workflows locally relay on annotations to give this tree-like graph or resources advanced semantic capabilities without breaking the base structure(or over modelling them), and unfortunately some of those Ontologies have been disappearing for lack of support.

Currently this is in my radar for annotations.

http://www.w3.org/TR/annotation-vocab/

and still, even when not updated in long time (but it's ORE based and can be re-build using PCDM) http://www.researchobject.org/scopes/ + http://wf4ever.github.io/ro-primer/

I really think this is a good discussion, thanks for bringing this up!

chmayo commented 7 years ago

The particular use case my institution is interested in is digital archival objects. Our current repository structure (which we will be migrating out of into Hydra) allows us to use METS structural metadata to preserve the physical arrangement of the collections we're digitizing for the archives (think collection, box, folder, etc.)

It's really important to us and to our archivists that we continue to be able to structure our digital items this way, not least because most of our archival material doesn't have item level description, so the digital images of each piece or paper (or what have you) need to live together under an overarching collection or object that can have descriptive metadata attached.

We're actually pretty thrilled that PCDM lets us do this in a way that we think makes sense (and also makes us a use case for the single Object with multiple file sets condition). I've been working on METS to PCDM mapping, and have samples of one of our METS files, a PCDM file that (hopefully) defines the same structure, and the XSLT I created to turn the one into the other available here: https://github.com/BCDigLib/Hybox-PCDM

I don't know if this helps or not, but it's another complex objects use case that I think is pretty important with regards to archives and cultural heritage organizations.

kieranjol commented 3 years ago

Sorry for the bump, but curious how you work around this scenario, @chmayo ?