Closed benwbrum closed 6 years ago
+1 to @benwbrum's observations.
Given the interest in scholarly uses of media components or fragments of larger resources, and the emergence of software platforms that facilitate these activities, being able to document the provenance of such components, fragments, etc. should be regarded at least as a best practice with regard to citation of origin, and a critical practice to maintaining functional, interoperable systems.
Using the within
element as suggested could address this concern.
Yup, :+1: from me too.
Users assembling a new manifest from individual canvases via the drag-and-drop discovery pattern must communicate the manifest ID in addition to the canvas ID
Which manifest is the canvas within, at that point? The "originating" manifest, or the "new" manifest? Perhaps the question is, "Does dragging a canvas into a manifest create a new canvas, or does it reference the existing canvas?
(or, for the nerds, are we doing copy-by-reference, or copy-by-value?)
Another potential hitch: what is the value of the within
element? For manifests, it's common for within
to point to a collection. The examples I've suggested mostly make sense with a manifest target, but a supplementary layer might point to a range or a sequence.
within
must be many to many (each canvas can be within many manifests, each manifest can have many canvases).
And I think we should do copy-by-reference ... e.g. you reuse the same canvas URI, such that all of the annotations congeal around it.
I'd personally be OK with a changable within
target depending on context (though I prefer the many-to-many suggestion that would augment the dragged canvas's within
with the new manifest ID while retaining the originating manifest ID), but think that we should avoid any attempt to change canvas IRIs -- otherwise collating annotations will be nigh impossible.
This will be very helpful for the independent publication of ranges, layers, etc.
Would it be helpful here to add the notion of a "canonical manifest" by which each canvas retains a tether to the original manifest in which the canvas id was minted?
I agree that the relation in general should be many-to-many, but it would be nice not to lose a sense of the id-minting-authority which always seems to come for the original, official manifest (ideally, the manifest published by the institution holding the material object.)
Assert as many within
s as you think will be useful to consumers of your content, and/or that you know about, including nested within
s that walk up a containment hierarchy:
"@id" : "their-canvas",
"@type: "sc:Canvas",
...
"within" : [
{ "@id": "my-manifest", "@type": "sc:Manifest" },
{ "@id": "your-manifest", "@type": "sc:Manifest" },
{
"@id": "their-manifest",
"@type": "sc:Manifest",
"within": [
{ "@id": "their-collection-1", "@type": "sc:Collection" },
{ "@id": "their-collection-2", "@type": "sc:Collection" }
]
}
]
Don't take this too far though! Stop before you get to iiif-universe. Usefulness should determine how much context a fragment like a canvas or layer should carry around with it.
@tomcrane's suggestion does a very nice job of handling the mixed-target case, as well:
"@id" : "this-canvas",
"@type: "sc:Canvas",
"within" :
{ "@id": "containing-sequence",
"@type": "sc:Sequence",
"within": {
"@id": "containing-manifest",
"@type": "sc:Manifest"
}
}
đź‘Ť Locally we have a need to determine which annotation layers are in use on a canvas, the hierarchical example would work nicely.
I like the idea of a "originating authority" property somewhere on a resource. I think of it more as a "cite as", rather than a "owned by"—or, in nerdspeak, a "canonical link".
re: @tomcrane's suggestion: I like it, though I think it needs to be discussed if it's appropriate to partially-embed resources into a manifest. It's useful in this situation, and I've found it useful in collections, for instance, when I'd like to add thumbnails to manifests without embedding their full context. But it seems like it's something that should be called out as possible for implementers, who might not currently assuming that a given embedded representation of a resource is incomplete.
Yes to all this. This is great. I especially like @tomcrane convention of { "@id": "my-manifest", "@type": "sc:Manifest" }
instead of just a URI, which may need to be dereferenced if you have a value for a sc:Sequence
and a sc:Manifest
.
+10 to this discussion—within
solves a bunch of issues and is a convention worth pushing. I think this also extends the cases for dereferencable sc:Canvas
objects to include sc:Sequence
objects as well. As a tool developer, it would be crazy exciting to be able to just annotate a single canvas without having to build an entire infrastructure around it.
Within a single repository, this all works swimmingly. However, what we are basically doing is putting a map of the local graph (if I'm using the language correctly) into the object itself. If the sc:Canvas
or sc:Sequence
were dereferencable (#980,ggâžš), we wouldn't be using this work-around (or at least work-alongside). All of the OP cases would be fixed if the URIs were a real location.
While within
short-circuits a query call, it will become very fragile if widely used. Consider the following alarmist and complex, but not unrealistic, success of a document:
sc:Manifest
(âžš). Let's say they aren't awesome and Canvas "3" does not resolve, but they have included clear within
Ă la @benwbrum and @tomcrane.Layer
/annotationList
and point it on
each Canvas
within
the Manifest
.Sequence
and copy all the canvases into it. This Sequence
could still be within
the canonical Manifest
. In most cases, it would be prudent to create an entire derivative Manifest
to replace the Sequence
.Manifest
(read the annotation, within
to the list or canvas, within
the manifest) with all 226 pages and scan to saa-4530_003
to get the image, then go back to the annotation to get the selector(s). annotationList
is within
, but the canonical one holds more metadata. (All the transcription tools I've been under the hood of do some version of forking the Manifest to get at the canvases.)within
may list both manifests, but doesn't offer guidance on why one may be better. Hopefully, the internals of the manifests offer a hint to its derivative status (convention needed).Manifest
that aggregates the annotated history of the Burkhart family, the genealogist still faces all the same decisions as the transcription application.within
indicates there may be two other manifests/sequences within which it is sequenced because the cost of checking is so high.sc:Canvas
at saa-4530_003
originated by e-codices, but they all live on the Internet and have the exact same URI. It is possible we could insist that the derivatives have a new URI and reference the one before it (and hopefully the date it forked and any relevant information about the scope of the fork). It is also possible developers will broadly ignore this extra step.saa-4530_003
will find anything hosted by e-codices (nothing at the moment and nothing at most repositories since metadata is the normal field to use to describe) and the transcription tool and the family history. That is great, except for the very careful work that must be done to render the returned annotation appropriately.This scenario is fairly realistic (if futuristic), and not even the most problematic. Imagine if e-codices moves their images and breaks all the downstream copies (see the manuscript browser in T-PEN for thousands of broken links) or changes the default height of their canvases to be 2000, instead of the naturalHeight
for their full/full
images, sending all off-site annotations careening offscreen. There are plenty of issues facing IIIF as it succeeds, but this one may create a negative feedback cycle for each manifest used in an interoperable way.
So, in summary. +10 to this discussion—within
solves a bunch of issues and is a convention worth pushing. I think this also extends the cases for dereferencable sc:Canvas
objects to include sc:Sequence
objects as well. As a tool developer, it would be crazy exciting to be able to just annotate a single canvas without having to build an entire infrastructure around it.
Re-reading this, and thinking about it, I think we're again talking about
the issues that are caused by a certain level of unclearness around IIIF as
a "Presentation API" and IIIF as a "RDF Data Model". If I'm reading
Patrick's email correctly, I think the issue we're running into is that
IIIF as a data model and IIIF as a Presentation API require different
tradeoffs, and within
seems to point some of them out.
I often think of a Manifest (or Collection, or Sequence, or Layer, or Range) a meta-document that describes a conceptual aggregation of Canvases and Annotations (or Content, or Annotation Lists) which are "real" data entities. They represent an opinion of an aggregated, descriptive sequence of content, from the point of view of the publisher of the Manifest.
In the 'RDF' world of IIIF Universe (or the Semantic Web), there can be
many statements made about any one of these, but as a publisher of a
resource, I choose which ones of these make sense in the context of my
Manifest. within
, to me, is a way for me to state that in my context,
these are the meta-documents that I view as relevant from the list of
possible meta-documents that refer to the content.
For me, if I want to add additional content, it makes sense to "copy-by-reference" the 'real' data entries, but to 'clone and republish' the meta-data entries that point to my new manifest. If the original publisher wants to merge the two, or replace the previous manifest with my new, augmented one, that's their prerogative, but it's a new document that provides a different opinion around what constitutes the appropriate subset of the "IIIF-Universe" of content.
Because of this, I think that this is part of what makes Discovery so important—the idea that someone else is providing additional content and additional context around your content, and we should help provide systems that make it easier to re-integrate that content.
p. (773) 547-2272 e. david.newbury@gmail.com
On Wed, Apr 12, 2017 at 3:47 PM, Patrick Cuba notifications@github.com wrote:
Yes to all this. This is great. I especially like @tomcrane https://github.com/tomcrane convention of { "@id": "my-manifest", "@type": "sc:Manifest" } instead of just a URI, which may need to be dereferenced if you have a value for a sc:Sequence and a sc:Manifest. TL;DR;
+10 to this discussion—within solves a bunch of issues and is a convention worth pushing. I think this also extends the cases for dereferencable sc:Canvas objects to include sc:Sequence objects as well. As a tool developer, it would be crazy exciting to be able to just annotate a single canvas without having to build an entire infrastructure around it. Sigh, that's a lot of details...
Within a single repository, this all works swimmingly. However, what we are basically doing is putting a map of the local graph (if I'm using the language correctly) into the object itself. If the sc:Canvas or sc:Sequence were dereferencable (#980 https://github.com/IIIF/iiif.io/issues/980,ggâžš https://groups.google.com/d/topic/iiif-discuss/HZtInSSs_8k/discussion), we wouldn't be using this work-around (or at least work-alongside). All of the OP cases would be fixed if the URIs were a real location.
While within short-circuits a query call, it will become very fragile if widely used. Consider the following alarmist and complex, but not unrealistic, success of a document:
- e-codices puts up AA/4530 as a sc:Manifest (âžš http://www.e-codices.unifr.ch/metadata/iiif/saa-4530/manifest.json). Let's say they aren't awesome and Canvas "3" http://www.e-codices.unifr.ch/metadata/iiif/saa-4530/canvas/saa-4530_003.json does not resolve, but they have included clear within Ă la @benwbrum https://github.com/benwbrum and @tomcrane https://github.com/tomcrane.
- A transcription application wants to transcribe just the death records from 3 to 24, ignoring the rest of the 226 pages. Options are:
- Leave all the canvases as they are and just trim the surplus in the UI. Create a standalone Layer/annotationList and point it on each Canvas within the Manifest.
- Create a custom Sequence and copy all the canvases into it. This Sequence could still be within the canonical Manifest. In most cases, it would be prudent to create an entire derivative Manifest to replace the Sequence.
- A genealogist discovers one of the transcription annotations while creating a collection of evidence about the Burkhart family history. Immediately there are a few problems: [image: image] https://cloud.githubusercontent.com/assets/1119165/24974196/ef3ac31a-1f86-11e7-8210-edb0d198bffc.png
- In order to render the annotation, just to read the image snippet, the genealogist('s application) must download the original Manifest (read the annotation, within to the list or canvas, within the manifest) with all 226 pages and scan to saa-4530_003 to get the image, then go back to the annotation to get the selector(s).
- In the most likely case, it is the transcription application's version of the Manifest she will get (since that is what the annotationList is within, but the canonical one holds more metadata. (All the transcription tools I've been under the hood of do some version of forking the Manifest to get at the canvases.)
- Even @tomcrane https://github.com/tomcrane within may list both manifests, but doesn't offer guidance on why one may be better. Hopefully, the internals of the manifests offer a hint to its derivative status (convention needed).
- At the end of the day, to create a Manifest that aggregates the annotated history of the Burkhart family, the genealogist still faces all the same decisions as the transcription application.
- Now, hapless user tries to view the history of the Burkhart family. Every single new page requires a call to and crawl through a manifest that may be hundreds of canvases long to render. Except that we are saved because the genealogist('s application) had created yet another fork of the canvases and created a new Manifest with them completely resolved. What User doesn't know is that e-codices has corrected an error in the dates attributed to the manifest, updated and translated the description to indicate that this seems to be a contestable retro-record after the loss of official documents, and improved the resolution of the scans within. Furthermore, in the transcription application, the unhelpful label "3" has been replaced with a slightly more descriptive "2v".
- None of the controversy surfaces because the canvas is probably rendered straight from the v3 manifest, even if the within indicates there may be two other manifests/sequences within which it is sequenced because the cost of checking is so high.
- The localized description is unavailable for the same reasons.
- The label will not be updated.
- and so we've broken the standard (to some extent):
- helpful information is missing and User may make bad decisions because of it.
- There are now 3 distinct versions of the sc:Canvas at saa-4530_003 originated by e-codices, but they all live on the Internet and have the exact same URI. It is possible we could insist that the derivatives have a new URI and reference the one before it (and hopefully the date it forked and any relevant information about the scope of the fork). It is also possible developers will broadly ignore this extra step.
- Anything asking the graph (remember this is an LOD fantasy world where the standard and infinite interoperability succeeded) for annotations on saa-4530_003 will find anything hosted by e-codices (nothing at the moment and nothing at most repositories since metadata is the normal field to use to describe) and the transcription tool and the family history. That is great, except for the very careful work that must be done to render the returned annotation appropriately.
This scenario is fairly realistic (if futuristic), and not even the most problematic. Imagine if e-codices moves their images and breaks all the downstream copies (see the manuscript browser in T-PEN for thousands of broken links) or changes the default height of their canvases to be 2000, instead of the naturalHeight for their full/full images, sending all off-site annotations careening offscreen. There are plenty of issues facing IIIF as it succeeds, but this one may create a negative feedback cycle for each manifest used in an interoperable way.
So, in summary. +10 to this discussion—within solves a bunch of issues and is a convention worth pushing. I think this also extends the cases for dereferencable sc:Canvas objects to include sc:Sequence objects as well. As a tool developer, it would be crazy exciting to be able to just annotate a single canvas without having to build an entire infrastructure around it.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/IIIF/iiif.io/issues/1126#issuecomment-293687341, or mute the thread https://github.com/notifications/unsubscribe-auth/AACG6P1jnYqMIhsPEDAcv3njpf5IyLTjks5rvSpMgaJpZM4MxnhB .
Great write up, Patrick!
I agree that within solves some issues, but I think that it comes at the problem (or at least one of the problems) from the wrong direction. If the resource is responsible for maintaining its own external relationships, accuracy will always be suspect in a distributed world. Instead, one should be able to discover all the places the resource "resides" by querying the "IIIF Universe". The ability to determine all the works that cite another work is a dream for scholars and would be a boon to researchers, and that is at least one facet of what we are talking about here. I agree with David that this is an area that the Discovery group should pursue as a long-term goal.
within may list both manifests, but doesn't offer guidance on why one may
be better. Hopefully, the internals of the manifests offer a hint to its derivative status (convention needed).
I agree that a convention is needed here. Again, being able to query for a resource's ancestors could more accurately tell a user all the resources that contain it, whereas the within value on the resource could unambiguously represent its origin.
Finally, in a distributed world the practice of "clone and republish" should be discouraged as it works against our goals. I don't know who else besides David is doing this, but I'm curious about the rationale that makes this worth pursuing?
-Shaun
On Sun, Apr 16, 2017 at 1:23 PM, David Newbury notifications@github.com wrote:
Re-reading this, and thinking about it, I think we're again talking about the issues that are caused by a certain level of unclearness around IIIF as a "Presentation API" and IIIF as a "RDF Data Model". If I'm reading Patrick's email correctly, I think the issue we're running into is that IIIF as a data model and IIIF as a Presentation API require different tradeoffs, and
within
seems to point some of them out.I often think of a Manifest (or Collection, or Sequence, or Layer, or Range) a meta-document that describes a conceptual aggregation of Canvases and Annotations (or Content, or Annotation Lists) which are "real" data entities. They represent an opinion of an aggregated, descriptive sequence of content, from the point of view of the publisher of the Manifest.
In the 'RDF' world of IIIF Universe (or the Semantic Web), there can be many statements made about any one of these, but as a publisher of a resource, I choose which ones of these make sense in the context of my Manifest.
within
, to me, is a way for me to state that in my context, these are the meta-documents that I view as relevant from the list of possible meta-documents that refer to the content.For me, if I want to add additional content, it makes sense to "copy-by-reference" the 'real' data entries, but to 'clone and republish' the meta-data entries that point to my new manifest. If the original publisher wants to merge the two, or replace the previous manifest with my new, augmented one, that's their prerogative, but it's a new document that provides a different opinion around what constitutes the appropriate subset of the "IIIF-Universe" of content.
Because of this, I think that this is part of what makes Discovery so important—the idea that someone else is providing additional content and additional context around your content, and we should help provide systems that make it easier to re-integrate that content.
David Newbury
p. (773) 547-2272 e. david.newbury@gmail.com
On Wed, Apr 12, 2017 at 3:47 PM, Patrick Cuba notifications@github.com wrote:
Yes to all this. This is great. I especially like @tomcrane https://github.com/tomcrane convention of { "@id": "my-manifest", "@type": "sc:Manifest" } instead of just a URI, which may need to be dereferenced if you have a value for a sc:Sequence and a sc:Manifest. TL;DR;
+10 to this discussion—within solves a bunch of issues and is a convention worth pushing. I think this also extends the cases for dereferencable sc:Canvas objects to include sc:Sequence objects as well. As a tool developer, it would be crazy exciting to be able to just annotate a single canvas without having to build an entire infrastructure around it. Sigh, that's a lot of details...
Within a single repository, this all works swimmingly. However, what we are basically doing is putting a map of the local graph (if I'm using the language correctly) into the object itself. If the sc:Canvas or sc:Sequence were dereferencable (#980 https://github.com/IIIF/iiif.io/issues/980,ggâžš <https://groups.google.com/d/topic/iiif-discuss/HZtInSSs_8k/discussion ), we wouldn't be using this work-around (or at least work-alongside). All of the OP cases would be fixed if the URIs were a real location.
While within short-circuits a query call, it will become very fragile if widely used. Consider the following alarmist and complex, but not unrealistic, success of a document:
- e-codices puts up AA/4530 as a sc:Manifest (âžš http://www.e-codices.unifr.ch/metadata/iiif/saa-4530/manifest.json). Let's say they aren't awesome and Canvas "3" http://www.e-codices.unifr.ch/metadata/iiif/saa-4530/ canvas/saa-4530_003.json does not resolve, but they have included clear within Ă la @benwbrum https://github.com/benwbrum and @tomcrane https://github.com/tomcrane.
- A transcription application wants to transcribe just the death records from 3 to 24, ignoring the rest of the 226 pages. Options are:
- Leave all the canvases as they are and just trim the surplus in the UI. Create a standalone Layer/annotationList and point it on each Canvas within the Manifest.
- Create a custom Sequence and copy all the canvases into it. This Sequence could still be within the canonical Manifest. In most cases, it would be prudent to create an entire derivative Manifest to replace the Sequence.
- A genealogist discovers one of the transcription annotations while creating a collection of evidence about the Burkhart family history. Immediately there are a few problems: [image: image] https://cloud.githubusercontent.com/assets/1119165/24974196/ef3ac31a- 1f86-11e7-8210-edb0d198bffc.png
- In order to render the annotation, just to read the image snippet, the genealogist('s application) must download the original Manifest (read the annotation, within to the list or canvas, within the manifest) with all 226 pages and scan to saa-4530_003 to get the image, then go back to the annotation to get the selector(s).
- In the most likely case, it is the transcription application's version of the Manifest she will get (since that is what the annotationList is within, but the canonical one holds more metadata. (All the transcription tools I've been under the hood of do some version of forking the Manifest to get at the canvases.)
- Even @tomcrane https://github.com/tomcrane within may list both manifests, but doesn't offer guidance on why one may be better. Hopefully, the internals of the manifests offer a hint to its derivative status (convention needed).
- At the end of the day, to create a Manifest that aggregates the annotated history of the Burkhart family, the genealogist still faces all the same decisions as the transcription application.
- Now, hapless user tries to view the history of the Burkhart family. Every single new page requires a call to and crawl through a manifest that may be hundreds of canvases long to render. Except that we are saved because the genealogist('s application) had created yet another fork of the canvases and created a new Manifest with them completely resolved. What User doesn't know is that e-codices has corrected an error in the dates attributed to the manifest, updated and translated the description to indicate that this seems to be a contestable retro-record after the loss of official documents, and improved the resolution of the scans within. Furthermore, in the transcription application, the unhelpful label "3" has been replaced with a slightly more descriptive "2v".
- None of the controversy surfaces because the canvas is probably rendered straight from the v3 manifest, even if the within indicates there may be two other manifests/sequences within which it is sequenced because the cost of checking is so high.
- The localized description is unavailable for the same reasons.
- The label will not be updated.
- and so we've broken the standard (to some extent):
- helpful information is missing and User may make bad decisions because of it.
- There are now 3 distinct versions of the sc:Canvas at saa-4530_003 originated by e-codices, but they all live on the Internet and have the exact same URI. It is possible we could insist that the derivatives have a new URI and reference the one before it (and hopefully the date it forked and any relevant information about the scope of the fork). It is also possible developers will broadly ignore this extra step.
- Anything asking the graph (remember this is an LOD fantasy world where the standard and infinite interoperability succeeded) for annotations on saa-4530_003 will find anything hosted by e-codices (nothing at the moment and nothing at most repositories since metadata is the normal field to use to describe) and the transcription tool and the family history. That is great, except for the very careful work that must be done to render the returned annotation appropriately.
This scenario is fairly realistic (if futuristic), and not even the most problematic. Imagine if e-codices moves their images and breaks all the downstream copies (see the manuscript browser in T-PEN for thousands of broken links) or changes the default height of their canvases to be 2000, instead of the naturalHeight for their full/full images, sending all off-site annotations careening offscreen. There are plenty of issues facing IIIF as it succeeds, but this one may create a negative feedback cycle for each manifest used in an interoperable way.
So, in summary. +10 to this discussion—within solves a bunch of issues and is a convention worth pushing. I think this also extends the cases for dereferencable sc:Canvas objects to include sc:Sequence objects as well. As a tool developer, it would be crazy exciting to be able to just annotate a single canvas without having to build an entire infrastructure around it.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/IIIF/iiif.io/issues/1126#issuecomment-293687341, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACG6P1jnYqMIhsPEDAcv3njpf5IyLTjks5rvSpMgaJpZM4MxnhB
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IIIF/iiif.io/issues/1126#issuecomment-294363374, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlMWKEN8rhGYSlvyDdRZOHljKPf1GgQks5rwk6QgaJpZM4MxnhB .
Notes:
within
. The external range references the resources in the Manifest, but it is NOT within
the Manifest. e.g. if you traverse the Manifest JSON, you will not find the Range or Sequence.within
... however it's impossible to know whether or not there should be a within
as there might not be a parent object.Proposal is to leave it as MAY in the specification due to above. When the enclosed resource is NOT dereferencable (e.g. a Canvas that isn't standalone) and is NOT within the current document (e.g. an annotation with a body or target of a Canvas in a different manifest), then the within
is really a MUST ... but that is not this issue.
Do need a note for client developers to follow within
or a look-here
property.
Proposal: Add "When encountered in a stand alone resource, clients can follow this link to retrieve the encapsulating resource." to the definition of within
.
for example, navigating up a collection hierarchy.
Closed by #1345
Standalone documents (such as de-referenced
canvas
,layer
, orsequence
elements) are of very limited use without any way to trace them back to an originating manifest. Consider the following use cases:sequence
or arange
representing a single text within a codex to be used preparing an edition of that text. While the whole codex (and its pages) are represented by an existing manifest and its default sequence, there is no way for scholars of that single text to use their sequence in isolation, despite it being the logical unit of work. Adding awithin
element to their sequence allows clients to interact with the originating manifest, extracting metadata, manifest-level annotations, services, and codicological information which may be relevant to editors.within
element to the dereferenced canvas obviates this problem by allowing clients to retrieve the originating manifest.within
element to stand-alone supplementary layers allows the annotations to be connected to the manifest they annotate.Future versions of the Presentation API should recommend that any dereferenced object below the manifest level SHOULD contain a
within
element.