Closed mjordan closed 5 years ago
@mjordan i agree that we need to at least suggest a common vocab to semantically denote the purpouse and the origin of our binaries and how they relate to each other (in the rels-int fasion). I would say "ontology" but i still have my doubts about how much of an complete ontology #use was able to describe when updated the last time in 2015. What makes me a little nervous is deciding if PCDM is really as widely used and maintained and if the ontology accepts changes and requests. All there is a subclass of pcdm:file and to be finer we could need some predicates also? Like derivedFrom, computedFrom, transformedFrom and how?
What is using Archivematica? Or thinking about using in the future? Or Archivespace? Wondering if we could work with those communities too (together) to come have a common ontology that is extendable and also comes from some other base ones (use is note derived and has e.g no skos close concepts or anything that binds them to other ontologies) so we are not isolated.
In any case, i support the effort or standardising and using more ontologies for binary resources.
E.g: In my own development i'm simply skipping derivative generation for images dropping the need for extra drupal nodes/media.I know it is totally not the Islandora way, but hey, IIIF already defines pretty well thumbnails and all other viewing options and their Ontologies are derived from other well known and widely used ones. Makes little sense for me if caching systems like the ones provided by IIIF Servers fulfil the promise and lower the processing needs (Thumbnails are generating in realtime and then cached via Cantaloupe). So in that case, IIIF ontologies (for viewing) are good. And IIIF API 3.0 specs define Video, etc capabilities.
@DiegoPino I agree with you that it's worth looking beyond PCDM if it doesn't fit our needs. I take your point that #use has not been updated since 2016, but I don't think there's anything stopping us from proposing updates, including any predicates like the ones you suggest. I mentioned PCDM because it started out as being shared by a wider, allied community, but if that is no longer the case, let's accept that. Maybe there's an opportunity here to grow that broader community.
I don't think Archivematica describes individual files the we we'd need to; if anything, I am guessing it would look to PREMIS, which doesn't have vocabulary similar to PCDM Use (its objectCharacteristicsExtension semantic unit punts on the sort of thing we are talking about). I'm totally in favor of looking at IIIF, as long as it provides a useful vocabulary for objects other than images. Can you point me to documentation on IIIF's use of other ontologies?
What I think it's very important to avoid is making up our own vocabulary without first eliminating others.
@mjordan Here's a mapping by @ruebot from way back when for moving from Fedora 3 to 4. It never got any traction at the time, which I imagine is because it was just too far out for people to see when the software didn't really exist yet. It just maps moves the DSID to a predicate as a straight up string, and doesn't attempt to pick something out of an ontology or vocabulary. That doesn't solve this particular problem, but there's a lot of other mappings in there that may still be pertinent as we consider our migration strategy. I think it's definitely still worth a look.
I also agree that it's important not to make our own vocabulary. But I'll take that a step further and say that we also don't need to identify an ontology/vocabulary that perfectly fits our needs, because it probably doesn't exist. Mixing and matching seems to be the way to go so long as you're not semantically violating the original intent of the various sources.
@dannylamb thanks for the pointer to that earlier work. I had forgotten about that. Also:
mixing and matching
++
@dannylamb that's the way I was thinking, for our purposes I've been looking at different ontologies to see what "pieces" work for us and making a list of things to use in different situations...still very much a work in progress but there is a lot out there
@DiegoPino FYI, if you went and replaced the pcdmuse
terms with ones from any other vocab/ontology and then updated a handful of views and contexts in Drupal, you can completely bend the derivative system (and whatever else) to your will. It'd be nice to see someone do that with something non-pcdm, as I know you are not the only one out there interested in other ontologies. What part of IIIF in particular are you looking at?
I can't find the IIIF ontology that Diego alluded to, but as for PCDM - it does seem to cover 95% of our datastreams. For instance, as Mark asked about Large Image's JP2
and JPG
- I think they both qualify as ServiceFile because they're both served as the web-presentation of the object in different contexts (if you have a large image viewer and if you don't, respectively). (i.e. multiple files may have the same PCDM #use but their mimetypes would have them handled differently).
In many cases, our PreservationMasterFile is also the OriginalFile - PDFA is the only case I can think of where they may be different. Can we apply multiple #use classes to the same file? I would think we could.
I like the idea of applying a framework - the PCDM ontology or any other - because it lets us semantically define what these 'datastreams' are in a way that isn't just an Islandora convention that grew organically. I'm not saying that we shouldn't mix and match from different ontologies if necessary, but that this might be a good opportunity to really examine how the datastreams we have fit together.
A way to describe a derivative's origin (in RELS-INT fashion) would be fantastic, but we don't model that right now. Depending on what ontology we use (and if it has application to the 'DSID' problem), it may warrant a separate discussion/ticket. For which I offer: CRMdig, an ontology about digital provenance. https://www.ics.forth.gr/isl/index_main.php?l=e&c=656
I don't think there's anything prohibiting you from slapping multiple pcdmuse types on a single object. And yeah... we can switch everything over to Original File. I made everything Preservation Master as a best guess.
When you're asking about derivative's origin, you mean linking back to the Original File from all the derivatives? That'd be nice to have in link headers as well as the RDF.
@rosiel Totally egregious, but there's a "convertedFrom" in the IANA link registry: https://www.iana.org/assignments/link-relations/link-relations.xhtml
It's meant for moving from draft to proposal to release candidate status for specs, but hey, we sure are converting those derivatives from a source....
@dannylamb,
The document linked to was later converted to the document that contains this link relation. For example, an RFC can have a link to the Internet-Draft that became the RFC; in that case, the link relation would be "convertedFrom".
I don't think it's "meant for moving from draft to…", I think that's just an example. But maybe not?
So is an OBJ from 7.x an "Original File" or a "Preservation Master File"? I've been saying Preservation Master, but looking at it now... I'm having second thoughts.
IMO it's an "Original File". As far as I know, the only standard 7.x solution pack that creates what we should consider a preservation master is the PDF SP, which optionally creates a PDF/A.
I'm playing around with this mapping in migrate_7x_claw
and I've come up with
7.x | CLAW |
---|---|
OBJ | http://pcdm.org/use#OriginalFile |
PDFA | http://pcdm.org/use#PreservationMasterFile |
OCR | http://pcdm.org/use#ExtractedText |
TN | http://pcdm.org/use#ThumbnailImage |
MEDIUM_SIZE | http://pcdm.org/use#ServiceFile |
JP2 | http://pcdm.org/use#IntermediateFile |
RELS-EXT | http://islandora.ca/ontology/relsext# |
DC | http://purl.org/dc/elements/1.1/ |
MODS | http://www.loc.gov/mods/v3 |
TECHMD | http://hul.harvard.edu/ois/xml/ns/fits/fits_output |
All the XML datastreams (RELS-EXT, DC, MODS, TECHMD) seem like the sort of thing that should be parsed and applied as fields on the node (RELS-EXT, DC, and MODS) or the Original File (TECHMD). But I don't see any harm in having tags to identify them for now while we sort out all the xpaths.
FYI totally guessing on those last four. I just threw in the namespaces for their respective ontologies. Open to suggestion on those for sure.
Should we include AUDIT in this list? Not sure what URI to suggest at this point.... other than the one used in the Fedora 3.8 FOXML: info:fedora/fedora-system:format/xml.fedora.audit
.
Sorry, that should be info:fedora/fedora-system:def/audit#
.
What does info:fedora
resolve to? http://www.fedora.info/definitions/1/0/
? Just a guess from looking at a foxml file, but I can't actually find the ontology on the net.
http://fedora.info/definitions/1/0/access/ObjState
is the closest thing I can find.
@mjordan I threw up a PR just to get it out there. I'll add AUDIT once we can figure out a url for it.
I'm sure this issue is floating just under the surface of a lot of what we've already accomplished (e.g., we're already using the term "Preservation Master") for images, but it might be useful to discuss defining a standard mapping between Islandora 7.x DSIDs and PCDM Use classes. For example:
http://pcdm.org/use#OriginalFile
http://pcdm.org/use#ExtractedText
http://pcdm.org/use#ThumbnailImage
There will be some gaps here, largely dependent on the idiosyncrasies of 7.x solution-packs. For example, what is the corresponding PCDM Use class for the large image solution pack's
JPG
and theJP2
datastreams?Using standard mappings will help us:
http://pcdm.org/use#PreservationMasterFile
resource.In general, we should ask "if we don't use PCDM Use classes to characterize binary resources, what vocabularies do we use?"