Simpler serialization for single-rendition publications

dauwhe commented 8 years ago

The vast majority of EPUBs contain a single rendition. Given that, what are the advantages and disadvantages of requiring a rendition object or array in the JSON package? Would it be desireable to omit the rendition object when not required?

{
    "metadata": {
        "title": "hello, world"
    },
    "links": [{
        "href": "index.html"
    }],
    "manifest": {
        "links": [{
            "href": "style.css"
        }]
    }
}

Advantages:

Simplicity for authors
Less nesting
Covers 99%+ of publications

Disadvantages

data model is less consistent
doesn't provide separation of bibliographic and rendition metadata for single renditions

HadrienGardeur commented 8 years ago

I think the example that you're using here is incorrect:

you can't use links at the top level of the document to express the spine, since this would mix the spine with other links (alternate link to a traditional EPUB document, link to external records and such)
this would therefore re-introduce a "spine" element or something similar
if we re-introduce a "spine" element, then we're back to where we were with the proposal to use "renditions"

HadrienGardeur commented 8 years ago

After carefully thinking about this and looking at your other issues, I think I'm getting a better idea of what you're objecting to @dauwhe .

I believe that what you actually don't like isn't truly related to "renditions" or any nitty gritty details, it has to do with having a common structure throughout this manifest.

IMO there are two ways this can go, either we adopt a common structure (everything is a collection, collections are made of metadata, links and subcollections like in EPUB 3.1) or we create specialized elements for everything.

What you're proposing here is a middle ground, which won't work since we're not getting the benefits of a common structure (consistency & extensibility) or the benefits from a specialized structure (compact syntax).

A specialized structure would look something like that:

{
  "title": "Moby Dick",
  "author": "Herman Melville",
  "identifier": "978031600000X",
  "http://example.org/extension/metadata": "some metadata",
  "spine": [{"href": "index.html", "type": "text/html"}],
  "manifest": [{"href": "style.css", "type": "text/css"}],
  "links": [{"rel": "alternate", "href": "book.epub", "type": "application/epub+zip"}],
  "http://example.org/extension/collection": [{"href": "index.html", "type": "text/html"}]
}

It's probably the type of compactness you're looking for (well except for having a spine+manifest but that's a different story) but there are downsides:

how can I tell the difference between an extension for metadata and an extension for a collection? Basically I can't.
each collection potentially behaves differently, there's no consistent syntax for them
basically, we don't have a collection model anymore, it's a specialized vocabulary where if you don't know the term, you can't have any expectation about it

Having worked on many different APIs and standards for APIs, I would never trade a standardized collection and link model for a specialized vocabulary, that's the opposite of a best practice when designing something RESTful. But quite a few people in the JSON world care more about compactness than consistency and extensibility, so I guess that the specialized vocabulary would be an easier sell for them.

iherman commented 8 years ago

I am not sure I understand the previous comments, @HadrienGardeur. I believe what @dauwhe is referring to is actually pretty simply. At present, I mean in the current proposal, his example:

{
    "metadata": {
        "title": "hello, world"
    },
    "links": [{
        "href": "index.html"
    }],
    "manifest": {
        "links": [{
            "href": "style.css"
        }]
    }
}

must be represented as

{
    "metadata": {
        "title": "hello, world"
    },
    "rendition" : [{
        "links": [{
            "href": "index.html"
        }],
        "manifest": {
            "links": [{
                "href": "style.css"
            }]
        }
    }]
}

and the question is whether, in this simple case, the extra layer created by "rendition" is necessary or not.

In my view it is not. It is a very widespread pattern to use some sort of a "default", or "implied" information for something like that and avoid the extra layer unless it is necessary (for example because, indeed, multiple renditions are required). This design choice was taken, to take the examples in my own experiences in other groups, in the definition of TriG to encode RDF datasets as an extension to Turtle, in JSON-LD, which does not require the usage of the @graph key on the top level unless the graph is not tree-like, or the metadata specification for CSV file, where the same JSON-LD (metadata) file can be used to provide metadata for several CSV files, but that extra layer is used only if it is really necessary and can be skipped if only one CSV file is annotated that way.

I do understand the value of a unified data model. And it is actually fine to define the data model in such a way that we indeed have renditions as some sort of a top level concept in the data model. But we should not mix up the data model and the JSON serialization thereof. It is perfectly fine to say that the relevant processor must use an implied "rendition" to abide easily to the data model, while keeping the simplicity to the user. This is actually the approach the groups have taken in all the examples cited above.

Bottom line: I am in favour of keeping the simple case simple and not to require the "rendition" unless it is necessary.

HadrienGardeur commented 8 years ago

Well let me explain my first concern all over again.

Currently, "links" at a publication level is used to express an information that is quite different from the "spine". Actually, for multiple reasons (I'll post an issue later) I strongly believe that it is required to have a link@rel="self" on every single BFF out there.

Let's adapt the example that @iherman just used:

{
  "metadata": {
    "title": "hello, world"
  },
  "links": [
    {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"},
    {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"}
  ],
  "rendition": [{
    "links": [{"href": "index.html"}],
    "manifest": {
      "links": [{"href": "style.css"}]
    }
  }]
}

Now you can clearly see why the content of rendition/links can't be moved into links without having an impact: it's not meant to express the same information, and if you mix them up it ends up being a mess.

To separate the publications links from the "spine", you need to wrap the spine in a separate element. Call it "rendition" or something else, it doesn't really make that big of a difference.

My second comment is more of a "meta" comment over several issues that @dauwhe has posted (#13, #14 and #21).

iherman commented 8 years ago

I am sorry, I probably have a slow day, but I still do not get it. What you say seems to suggest that the same term (link) is used for two different purposes with different structures. If that is the case, this is bad design that must be changed. I still do not see how that fundamentally affects the issue of @dauwhe.

HadrienGardeur commented 8 years ago

I don't think that's bad design, we see that pattern in other standards too. For example atom:link in an atom:feed is a link related to the feed, while atom:link in an atom:entry is a link related to the entry.

Same idea here. "links" at the top level are about the publication, while the ones in "rendition" are related to the rendition.

iherman commented 8 years ago

And not all standards reflect good design (only):-) I can say that, I make a living on them!

HadrienGardeur commented 8 years ago

Well, I strongly believe that for API design, generic structures (collection, link) are incredibly better than specialized vocabularies. I don't want to re-invent the wheel all the time in order to express the same information over and over again.

HadrienGardeur commented 8 years ago

As we've discussed during the F2F, I don't think that we can make the syntax any easier and drop the "rendition" key.

To close this issue I propose to replace it with two new issues instead:

should we use "rendition" for the name of our key?
do we want to support multiple renditions in BFF (now that we're not tied to 3.1)?

iherman commented 8 years ago

@HadrienGardeur you don't think we can make the any syntax easier, and I still do not see this. There has been no conceptual answer to my comment https://github.com/dauwhe/epub31-bff/issues/13#issuecomment-195349811, except that other specifications also do use the same key for different purposes. That is not an argument as far as I am concerned, @dauwhe's original example in https://github.com/dauwhe/epub31-bff/issues/13#issue-139958113 and I have not seen any convincing arguments yet...

I believe we agreed at the F2F that we would have the same examples clearly described in a simple approach and following your syntax to make a clear choice. We should decide based on those; the best is to use that example in a new issue (as you propose) for the first item.

As for the second issue, I agree that this is a separate issue and should be treated as such.

HadrienGardeur commented 8 years ago

I've actually provided a very good reason why we can't drop the "rendition" key (or whatever we call it): we need to separate the links related to a publication from what's essentially the spine.

As for using the link element in different collection roles, we already do the same thing in EPUB, not just in other specifications. For a conceptual answer to that question, I would say that there are essentially two approach to that problem:

you can identify the core concepts and building blocks and build your syntax that way
or you can have an approach where you create specialized elements for each and every use case

Historically, the IDPF has IMO followed the second path, which is why we have multiple elements which pretty much do the same thing with a slightly different syntax. I strongly believe that this approach is weak and usually results in a lot of bloat. Identifying core concepts and building blocks feels far more sustainable, and leads to a smaller set of elements along with a more extensible syntax.

iherman commented 8 years ago

I've actually provided a very good reason why we can't drop the "rendition" key (or whatever we call it): we need to separate the links related to a publication from what's essentially the spine.

Let us be precise: you have provided a reason why you think it is good. Your arguments, in this example, boils down to the problem of using the element in two different contexts, and that introducing the 'rendition' element is the way of doing it. I agree it is a way of doing this, I do not agree this is the only way and that this extra complication in the structure makes it justifiable. Using two different element names, for example, leads to the same approach, while keeping the overall structure much simpler. If this is the only reason why you want to keep things separated, this is not convincing.

Again, I believe we agreed on the F2F to have a clear, possibly non-trivial example shown using the simple and complex approaches, with (for now) not considering multiple rendition. Once we agree on the two syntax versions, we can decide.

HadrienGardeur commented 8 years ago

No, I've provided a valid reason, since it's absolutely necessary to know the difference between these two things.

As for using a different element sure, but how is that different from using a "rendition" key like we do right now? It won't change the fact that we can't avoid having an element for that.

iherman commented 8 years ago

On 8 Apr 2016, at 15:25, Hadrien Gardeur notifications@github.com wrote:

No, I've provided a valid reason, since it's absolutely necessary to know the difference between these two things.

As for using a different element sure, but how is that different from using a "rendition" key like we do right now? It won't change the fact that we can't avoid having an element for that.

Can we stop this ping-pong, please?

I would like to see a larger, non trivial example, encoded in version of @dauwhe, and the same example (not pulling up extra features) in your approach, and decide then.

HadrienGardeur commented 8 years ago

Sure, I've already provided that in a previous reply, I can do it again.

Here's the current syntax, adapted with some of the things discussed recently:

{
  "metadata": {
    "title": "hello, world"
  },
  "links": [
    {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"},
    {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"}
  ],
  "rendition": {
    "metadata": {"layout": "reflowable"},
    "links": [{"href": "index.html", "type": "text/html"}],
    "manifest": [{"href": "style.css", "type": "text/css"}]
  }
}

Now, with the proposed syntax from Dave the main difference would be that:

the rendition specific metadata (layout, accessMode and others) would have to be with the publication metadata
we have to create a new element to identify the spine (I'll use sequence since that's something that @dauwhe used before)
we can't really support multiple renditions with that syntax

An updated version of what @dauwhe proposed could look like this:

{
  "metadata": {
    "title": "hello, world",
    "layout": "reflowable"
  },
  "links": [
    {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"},
    {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"}
  ],
  "sequence": [{"href": "index.html", "type": "text/html"}],
  "manifest": [{"href": "style.css", "type": "text/css"}]
}

As long as it looks something like that, I'm fine with both examples, but @dauwhe initial proposal can't work.

iherman commented 8 years ago

Thank you. I would prefer @dauwhe decide what he proposes and take it from there.

All that being said: I believe we also said at the F2F that we should not get into JSON-specific syntax issue for now. Instead, we should concentrate on the general model and what we want to express via the manifest in somewhat more abstract form, and worry about the detailed JSON syntax later…

On 8 Apr 2016, at 15:50, Hadrien Gardeur notifications@github.com wrote:

Sure, I've already provided that in a previous reply, I can do it again.

Here's the current syntax, adapted with some of the things discussed recently:

{ "metadata": { "title": "hello, world" }, "links": [ {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"}, {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"} ], "rendition": { "metadata": {"layout": "reflowable"}, "links": [{"href": "index.html"}], "manifest": [{"href": "style.css"}] } } Now, with the proposed syntax from Dave the main difference would be that:

the rendition specific metadata (layout, accessMode and others) would have to be with the publication metadata we have to create a new element to identify the spine (I'll use sequence since that's something that @dauwhe https://github.com/dauwhe used before) we can't really support multiple renditions with that syntax An updated version of what @dauwhe https://github.com/dauwhe proposed could look like this:

{ "metadata": { "title": "hello, world", "layout": "reflowable" }, "links": [ {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"}, {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"} ], "sequence": [{"href": "index.html"}], "manifest": [{"href": "style.css"}] } As long as it looks something like that, I'm fine with both examples, but @dauwhe https://github.com/dauwhe initial proposal can't work.

—

HadrienGardeur commented 8 years ago

I've also proposed a model for that in the F2F detailed agenda.

Here's what it looks like:

publication
    metadata
    links
    rendition (one or more)
        metadata (zero or one)
        links
        manifest (zero or one)
        other collections (zero or more)
    other collections (zero or more)

If we drop the rendition key, limit to one rendition and use sequence instead it would look like that instead:

publication
    metadata
    links
    sequence
    manifest (zero or one)
    other collections (zero or more)

For the two issues that I've suggested, the first one is tied to the syntax (and JSON) while the other one is conceptual. I don't think we have to prioritize one over the other as long as we identify and separate such discussions.

iherman commented 8 years ago

I've also proposed a model for that in the F2F detailed agenda.

Here's what it looks like:

publication metadata links rendition (one or more) metadata links manifest (zero or one) other collections (zero or more) other collections (zero or more) If we drop the rendition key, limit to one rendition and use sequence instead it would look like that instead:

publication metadata links sequence manifest other collections (zero or more) I would concentrate on this option for now. Just checking, to be sure of the details:

metadata: means the usual things (author, title), plus, eg, terms from ONIX or BIBO or whatever.

sequence: this is the equivalent of the spine?

links: are to 'external' resources. All of them, or only those that do not appear in the sequence?

manifest: I am not really sure what this means. It is not metadata; is it whatever is necessary for the correct processing of a publication (e.g., the various locators that the Locator TF worked on at the DPUB IG?). There should be some clear difference between this and what metadata means

other collections: I am not sure what this means.

HadrienGardeur commented 8 years ago

It's basically like classic EPUB 3.1, the only different term is sequence:

metadata: in this specific case, metadata about the collection with the usual things that you listed (title, author, publication date, publisher)
sequence: same thing as the spine, it's a list of links in reading order
links: these links are for other use cases than the spine or the manifest, for example a link to an external record or a link to the same publication in a ZIP container (classic EPUB)
manifest: this is a list of resources that are necessary to display/cache the publication but not part of the sequence (images, CSS, fonts for example)
other collections: in EPUB 3.0.1 and EPUB 3.1 it's possible to group resources together and indicate what they are together. For example if you have a short story collection (that's the publication) you could say that the first three HTML resources are actually the same short story (and provide the metadata for that specific short story).

HadrienGardeur commented 8 years ago

Here's an example to understand the "other collection" part of it better (I'm using JSON because it's easier to understand that way):

{
  "metadata": {
    "title": "PWP Primer",
    "layout": "reflowable"
  },
  "links": [
    {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"},
    {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"}
  ],
  "sequence": [
    {"href": "article1.html", "type": "text/html"},
    {"href": "article1-continued.html", "type": "text/html"},
    {"href": "article2.html", "type": "text/html"}],
  "manifest": [{"href": "style.css", "type": "text/css"}],
  "article": [
    {
      "metadata": {"title": "The Concept of PWP", "author": "Ivan Herman"},
      "links": [{"href": "article1.html"}, {"href": "article1-continued.html"}]
    }, {
      "metadata": {"title": "The PWP Manifest", "author": "Hadrien Gardeur"},
      "links": [{"href": "article2.html"}]
  }]
}

In this example:

the publication is called "PWP Primer" and it has three different HTML resources in its reading order
the first article is called "The Concept of PWP" and it's written by Ivan Herman, and we know that this article is spread along two different HTML resources
the second article is called "The PWP Manifest" and it's written by Hadrien Gardeur, it's limited to a single HTML resource

Collections were introduced in EPUB 3.0.1 because EPUB was limited to:

metadata about a publication
metadata about a specific resource (using the "refines" mechanism that we're dropping in EPUB 3.1)

If you wanted to provide any kind of information about a group of resources (or even group resources together at all), well you couldn't.

Collections have been used in many different specifications since then:

to indicate which resources are part of a preview/sample for a publication
to indicate the presence of a dictionary or an index
to indicate that parts of a publication can be extracted and used on their own (distributable objects)

The core idea behind BFF is that we can extend this use of collections everywhere in EPUB, which means that publications, renditions, the spine and the manifest are all collections too.

dauwhe commented 8 years ago

I do dislike the term "spine", but given that this is still EPUB and not PWP, I'd be happy to s/sequence/spine/. And here, "spine" is very close in concept to EPUB Classic.

Reserving link for these special links is good, I think. And I think this does make possible a lot of interesting things.

So I'd propose (in text just to make it easier to read):

metadata
   title
   identifier
   language
   modified
links?
spine
   ch01
   ch02
manifest 
   css
   images

So the case of a single rendition is very simple. Adding additional renditions does involve a new array:

metadata
spine
manifest
renditions 
   metadata
   spine
   manifest
collection?
collection?

The downside is that there's not a clear separation of rendition metadata and publication metadata in the default rendition but I'm willing to pay that price to make 99.9% of books easier.

Here's a small example in JSON:

{
  "metadata": {
    "title": "Moby-Dick",
    "language": "en",
    "identifier": "978031699999X",
    "modified": "2016-02-01T15:45:00.000Z",
    "layout": "reflowable"
  },
  "spine": [{
    "href": "c001.html",
    "type": "text/html"
  }, {
    "href": "c002.html",
    "type": "text/html"
  }],
  "manifest": [{
    "href": "style.css",
    "type": "text/css"
  }],
  "links": [{
    "href": "/search?q={query}",
    "type": "text/html",
    "rel": "search",
    "templated": true
  }],
  "renditions": [{
    "metadata": {
      "layout": "pre-paginated"
    },
    "spine": [{
      "href": "p001.html",
      "type": "text/html"
    }, {
      "href": "p002.html",
      "type": "text/html"
    }],
    "manifest": [{
      "href": "style-fixed.css",
      "type": "text/css"
    }]
  }]
}

dauwhe commented 8 years ago

And it's positively lovely in YAML, which is probably how I'd author in real life ;)

metadata: 
   title: Moby-Dick
   language: en
   identifier: 978031699999X
   modified: 2016-02-01T15:45:00Z
   layout: reflowable

spine:
 - href: c001.html
   type: text/html
 - href: c002.html
   type: text/html

manifest:
 - href: style.css
   type: text/css

links: 
 - href: /search?q={query}
   type: text/html
   rel: search
   templated: true

renditions:
  - metadata: 
      layout: pre-paginated

    spine:
     - href: p001.html
       type: text/html
     - href: p002.html
       type: text/html

    manifest:
     - href: style-fixed.css
       type: text/css

HadrienGardeur commented 8 years ago

Now that we've introduced this notion of "compact collection" (a collection with no metadata or sub-collections, where the syntax is limited to link objects) I think that we agree on how things work for single rendition (with no support for additional renditions).

If we decide to also support multi-renditions, I still prefer Matt's suggestion to use renditions instead of spine and get the separation between publication/layout metadata as a bonus. I really don't think that the added nesting is that bad. If you're really uncomfortable with the use of renditions for single rendition documents, another alternative is to limit its use to multiple rendition documents (but all renditions would be listed under renditions not just the secondary ones).

A few additional comments:

in your proposed model and syntax for multiple renditions, you're using a spine sub-collection. This is one way to do it, but we could also use links for that purpose as we've done before (unless you see another use case for links at a rendition level?)
how will rendition selection work? Do you also add those in the metadata's publication?
what about other metadata for the main rendition? What if for example I have a multi-lingual book where each rendition has a different title? How do I separate the main rendition metadata from the publication's metadata?

These two last points are roughly the same issue, but it's related to all the things that are rendition specific (layout properties, metadata and rendition selection) and that we'll have a hard time separating from the publication metadata if we follow your proposal.

dauwhe commented 8 years ago

I think it works if we say any metadata on the publication is inherited by a rendition, but is overridden by metadata in that rendition. And it's easy enough to put selection metadata for the primary rendition in the top-level metadata:

metadata
   title: Moby-Dick;  or, the whale
   language: en
   label: 'English version'
...
renditions
   metadata
      title: Moby-Dick ; ou, le Cachalot
      language: fr
      label: 'Version française'

Are you aware of existing conflicts between rendition metadata and other metadata? As long as we don't use the same name with different meanings, I think we're OK.

HadrienGardeur commented 8 years ago

What you're proposing is slightly better than the current situation (in classic EPUB), but still doesn't fully solve that problem.

If a publication has a title and two renditions, and each rendition has its own title, then there's no way to express that information. In your example, it's also inaccurate or incomplete to say that the publication language is English, when another rendition is also available in French.

Having all renditions in renditions solve all those problems. As I've suggested before, if you really don't want to have it for single rendition publications, we could limit its use to multiple renditions publications.

dauwhe commented 8 years ago

Ah, I think I understand. In EPUB Multiple-Rendition Publications, you can optionally have a title element in metadata.xml, which a reading system could interpret as the title of the publication, independently of the title of any particular rendition.

I think I'm OK with having everything in rendition if there is more than one rendition.

HadrienGardeur commented 8 years ago

OK, good to have something we can agree on for both type of publications.

What about the following question:

in your proposed model and syntax for multiple renditions, you're using a spine sub-collection. This is one way to do it, but we could also use links for that purpose as we've done before (unless you see another use case for links at a rendition level?)

iherman commented 8 years ago

Doesn't it sound more logical to use the 'spine' term for this? It seems to be consistent with the single >rendition case...

Agree that it would be more consistent, but that's still different to what we had before, which is why I want to make sure that we're all on the same page.

iherman commented 8 years ago

I have a terminological issue; better take it now.

If we look ahead a bit in direction of PWP (which I believe is what we have to do) then we have to be careful with the term "manifest". For me, the "manifest" should be the whole thing we are talking about here, and not some sub-part. This would be in line with the way the term is used on the Web, and it would help as align, again on long terms, with the ongoing work on Web Manifests (even if that alignment will require additional work).

Looking back at the thread, @HadrienGardeur said:

links: these links are for other use cases than the spine or the manifest, for example a link to an external record or a link to the same publication in a ZIP container (classic EPUB) manifest: this is a list of resources that are necessary to display/cache the publication but not part of the sequence (images, CSS, fonts for example)

I am not sure I completely grasp the difference here. Both are a series of links, even if the targets play a different role. Moreover, the example for links include references with an extra type attribute; which is fine, but doesn't this mean that it is possible to merge these two notions (thereby leaving the term manifest for the whole thing) and use the type attribute when some extra information is necessary?

iherman commented 8 years ago

I have a question for the 'metadata' section of the model. @HadrienGardeur says:

metadata: in this specific case, metadata about the collection with the usual things that you listed (title, author, publication date, publisher)

We need to be more crisp than that. Is it possible to say that the metadata part of the model is the set of RDF metadata terms that are relevant for the publication (or a rendition)? If we say that, does that also means that, in the JSON representation of the model, that part of the information SHOULD (or MUST) be considered as a JSON-LD section, with all possibly syntactical consequences thereof? Or do we intend to consider the whole thing as one big JSON-LD (I would be a bit concerned about this)? I do not have a clear answer in my mind. Both approaches have a problem:

if we have a JSON-LD 'fragment', this means we have to have a special processing model/procedure to extract that part and hand it over to a bona fide JSON-LD processor
if the whole thing is JSON-LD, then we may have to pay a price in terms of additional complexity in term of syntax (e.g., the usage of @graph)

HadrienGardeur commented 8 years ago

I have a terminological issue; better take it now.

If we look ahead a bit in direction of PWP (which I believe is what we have to do) then we have to be careful with the term "manifest". For me, the "manifest" should be the whole thing we are talking about here, and not some sub-part. This would be in line with the way the term is used on the Web, and it would help as align, again on long terms, with the ongoing work on Web Manifests (even if that alignment will require additional work).

We inherit the term "manifest" from the IDPF and EPUB for that use case. I agree with you that we should use "manifest" for the whole thing, which means that we should probably rename the IDPF use of "manifest" to something else. Could we call it "resources" for example?

I am not sure I completely grasp the difference here. Both are a series of links, even if the targets play a different role. Moreover, the example for links include references with an extra type attribute; which is fine, but doesn't this mean that it is possible to merge these two notions (thereby leaving the term manifest for the whole thing) and use the type attribute when some extra information is necessary?

You need to think about the global model, not just these two.

Everything in this new manifest is a collection. Collections are defined as:

role
  metadata
  links
  subcollection (role)

In links you'll find an array of link objects, currently each link object can have the following keys:

href
type
rel
title
properties
...

In order to make the syntax more compact, we've discussed with Dave the use of what we've called so far "compact collections". Such collections would only have links, and wouldn't allow subcollections or metadata. With these restrictions, the syntax for compact collections could be limited to an array of link objects (same as links).

In this model, spine and manifest are compact collections while renditions, preview or distributable-object are all full collections.

HadrienGardeur commented 8 years ago

We need to be more crisp than that. Is it possible to say that the metadata part of the model is the set of RDF metadata terms that are relevant for the publication (or a rendition)?

It's actually a set of RDF metadata terms that are relevant for the current collection. At a top-level metadata will be about publication, while inside renditions it'll be about the rendition it's part of.

If we say that, does that also means that, in the JSON representation of the model, that part of the information SHOULD (or MUST) be considered as a JSON-LD section, with all possibly syntactical consequences thereof? Or do we intend to consider the whole thing as one big JSON-LD (I would be a bit concerned about this)?

I previously proposed the use of a JSON-LD fragment but after discussions with you and others, dropped the idea and focused on creating a context that works for the whole document instead: https://gist.github.com/HadrienGardeur/03ab96f5770b0512233a That said, it doesn't cover all of the keys defined in the manifest, just a subset of them.

if we have a JSON-LD 'fragment', this means we have to have a special processing model/procedure to extract that part and hand it over to a bona fide JSON-LD processor

Agree, that's part of the reason why I would rather avoid this.

if the whole thing is JSON-LD, then we may have to pay a price in terms of additional complexity in term of syntax (e.g., the usage of @graph)

That's why I would like to have additional restrictions and not allow full JSON-LD.

Maybe it's best to move this discussion to another issue?

iherman commented 8 years ago

On 16 Apr 2016, at 08:18, Hadrien Gardeur notifications@github.com wrote:

I have a terminological issue; better take it now.

If we look ahead a bit in direction of PWP (which I believe is what we have to do) then we have to be careful with the term "manifest". For me, the "manifest" should be the whole thing we are talking about here, and not some sub-part. This would be in line with the way the term is used on the Web, and it would help as align, again on long terms, with the ongoing work on Web Manifests https://www.w3.org/TR/appmanifest/ (even if that alignment will require additional work).

We inherit the term "manifest" from the IDPF and EPUB for that use case. I agree with you that we should use "manifest" for the whole thing, which means that we should probably rename the IDPF use of "manifest" to something else. Could we call it "resources" for example?

That term works for me.

iherman commented 8 years ago

I am not sure I completely grasp the difference here. Both are a series of links, even if the targets play a different role. Moreover, the example for links include references with an extra type attribute; which is fine, but doesn't this mean that it is possible to merge these two notions (thereby leaving the term manifest for the whole thing) and use the type attribute when some extra information is necessary?

You need to think about the global model, not just these two.

Everything in this new manifest is a collection. Collections are defined as:

role metadata links subcollection (role) You mean, I presume, role=article, for example, right?

In links you'll find an array of link objects, currently each link object can have the following keys:

href type rel title properties ... In order to make the syntax more compact, we've discussed with Dave the use of what we've called so far "compact collections". Such collections would only have links, and wouldn't allow subcollections or metadata. With these restrictions, the syntax for compact collections could be limited to an array of link objects.

In this model, spine and manifest are compact collections while renditions, preview or distributable-object are all full collections.

I am sorry, but I have no idea what you are talking about in terms of my original question: why making a difference between links and manifest (in the current sense of the word). As I said, they are both an array of links, with all kinds of attributes (keys). Why not simplify by merging these two terms? I also do not understand what you mean by "spine and manifest are compact collections"; you seem to introduce a model into the discourse that I did not see before.

dauwhe commented 8 years ago

I'm fine with resources instead of manifest, to denote publication resources that are not part of the spine. So for a typical EPUB, BFF spine + resources = EPUB classic's manifest.

I would also use spine and resources in renditions and collections, and reserve links for things like search, external files that are not part of the publication, etc.

iherman commented 8 years ago

So... @dauwhe's terminology made it clear for me, that is the answer I was waiting for:

I would also use spine and resources in renditions and collections, and reserve links for things like search, external files that are not part of the publication, etc.

(emphasis is mine.)

Updating the terminology, what we have:

metadata:
  ...
spine:
  ...
resources:
  ...
links:
  ...

However, I think the terminology is misleading. What bothered me so far was that, on the one hand, I did not really understand the real difference between links and resources and, on the other hand, that the content of resources are links, ie, hyperlinks, ie, URL-s. Sure, in EPUB the content of resources are relative URL-s, whereas the content of link may be absolute URL-s, but again looking ahead towards PWP (or simply WP, ie, Web Publication), this difference may become irrelevant in the future. In other words, the term "links" is very misleading imho.

If I connect this to the terminology we used in the PWP draft, what we call now resources, together with the spine, list the constituent resources of a PWP; whereas what we call links are the external references that are not part of the Publication per se. Ie, if I cache the Publication, or package it for a possible transfer, those are the non-essential resources, that are not required to be added to the package, for example. If this model is correct, the terminology should be rather something like 'essential resources' and 'extra resources', ie, we get something like:

metadata:
   ...
spine:
   ...
essential resources:
   ...
extra resources:
   ...

I realize these terms are a mouthful, and not really good as JSON keys, but we are talking about the model for the time being, we can worry about the serialization/naming later.

Does this make sense, conceptually?

HadrienGardeur commented 8 years ago

Sorry but this is actually inaccurate to say that linksare for things that are not part of the publication.

You've rightfully identified that spine and manifest, two classic EPUB concepts are in fact essentially links, something that was missing in the EPUB 3.x specifications. Because of that, EPUB 3.X has multiple elements that behave like link and could be replaced by a standard link element.

The very first thing that we did for BFF was to agree on a conceptual model that is detailed multiple times in the comments above: everything is a collection.

You've asked: why do we need to have separate terms for links and manifest? Well the main reason for that is that manifest is a collection role, what it tells the client processing the BFF is that these links have a special purpose. In the case of spine the client knows that these are the main constituents of the publication in reading order.

manifest looks like links for a simple reason: it's a collection that only has links and no metadata or subcollections. The full collection syntax would look like that:

manifest
  metadata
    [None]
  links
    Array of link objects
  subcollections (identified by their role)
    [None]

We've agreed that for such collections (that we call compact collections), it is allowed to limit their syntax to link objects to be more compact:

manifest
  Array of link objects

Up until recently we always used the full syntax instead but that bothered @dauwhe:

manifest
  links
    Array of link objects

This doesn't change the conceptual model though, they're still collections where the role identifies the semantics and links is used to list resources.

Like metadata, links are tied to a specific collection and its purpose is relative to the role of that collection.

For example in publication :

metadata is used to provide the publication's metadata
links is used for external resources such as a link to a classic EPUB or a link to an external metadata record

While in spine, links point to all the resources that should be displayed in reading order.

Think of both metadata and links as properties, respectively meant to list metadata and to list resources, while the collection role provides the semantics. To further align with JSON-LD, we could actually call them @metadata and @links since they play a special role in our syntax.

iherman commented 8 years ago

@HadrienGardeur, sorry but I continue to be confused. We are talking about a model and a proper terminology, and you seem to use terms, possibly changing them on the fly, that are not properly defined. You also presume here and there that "we have agreed" but I do not see any evidence for that in the thread other that you proposed things. My confusion is still such that I certainly cannot just agree with all this until I understand it.

I try to provide some brain dump here, using an ad-hoc notation. Whether it is mine or yours, we will see:-) But this seems to be what you are proposing for a terminology:

link objects: a structure consisting of a URI and possible attributes on the URI
collection: a data structure identified by a 'role', and consisting of
- links: array of link objects
- metadata: essentially key value pairs (or RDF statements?)
- further collections, recursively
As an abuse of notation, unless noted otherwise, everything is a collection, ie, each term below identifies a a collection with the term name being the role of the collection
Collections with specific roles may be restricted insofar as they may contain no metadata or no further collections. Examples are spine, resources, extra_resources

Using this, we have, as a fundamental model, adding a ? for the optional collections

manifest:
    spine               # provides the reading order
    resources?          # essential resources for the publication
    extra_resources?    # non-essential resources for the publication
    ...                 # extra collections, like alternative language, renditions, etc

I am not interested, at this point, how these look in JSON-LD, and where such a serialization can be simplified. Instead: is this what you mean? Because if that is indeed what you mean, we have a basis of discussion on whether we agree or not...

HadrienGardeur commented 8 years ago

"manifest", "spine", "collection" and quite a few others are terms that are already defined and used in the EPUB specification, so sorry but I'm not "using terms, possibly changing them on the fly, that are not properly defined". We've talked about calling "manifest" something else but that's the one and only term for which we're changing anything on the fly (and you actually started that thread).

For the discussion about the data model, we've talked about it multiple times both during BFF and general 3.1 calls and yes, there was a consensus specifically about it.

That said, your list is correct and the model that you're describing too except that there's already something defined for "extra_resources" in EPUB 3.0.1 and that's "links".

dauwhe commented 8 years ago

Perhaps one thing we can usefully discuss is the difference between what we're calling resources and extra_resources or links. I've imagined resources as, in classic EPUB terms, all the stuff that's in the manifest but not the spine.

But there seem to be several types of things in extra_resources or links. Hadrien's examples in the google doc include:

links to a search API
links to ODPS catalog feeds
links to encryption or XML signatures
true alternative versions of the publication (rendition?), such as a link to a classic EPUB version

Separately, in the PWP work we've discussed that some publication resources may not be strictly essential to view the publication in one state or the other. For example, certain fonts may not be required to understand the content, or it may not be desirable to download a huge video file for offline viewing. Would such things be extra_resources?

iherman commented 8 years ago

On 17 Apr 2016, at 10:07, Hadrien Gardeur notifications@github.com wrote:

That said, your list is correct and the model that you're describing too

Good. We have a basis for mutual understanding. Actually, the structure is not that far away from what I described in my [previous comment](https://github.com/dauwhe/epub31-bff/issues/13#issuecomment-210997560 <https://github.com/dauwhe/epub31-bff/issues/13#issuecomment-210997560) on the role of resources and extra resources.

except that there's already something defined for "extra_resources" in EPUB 3.0.1 and that's "links".

Right. And I continue to be against the usage of the terms for this. The term 'link' is overloaded:

We have link structures
We have the 'links' as part of a collection
We have then 'links' as part of the top level manifest

I think it is a bad idea to overload the term, hence my preference for something like "extra resources" (I am not bound that specific term, we can try to find something different)

iherman commented 8 years ago

On 17 Apr 2016, at 13:16, Dave Cramer notifications@github.com wrote:

Perhaps one thing we can usefully discuss is the difference between what we're calling resources and extra_resources or links. I've imagined resources as, in classic EPUB terms, all the stuff that's in the manifest but not the spine.

Yep for me.

But there seem to be several types of things in extra_resources or links. Hadrien's examples in the google doc include:

links to a search API links to ODPS catalog feeds links to encryption or XML signatures true alternative versions of the publication (rendition?), such as a link to a classic EPUB version Separately, in the PWP work we've discussed that some publication resources may not be strictly essential to view the publication in one state or the other. For example, certain fonts may not be required to understand the content, or it may not be desirable to download a huge video file for offline viewing. Would such things be extra_resources?

That is exactly what I was saying: for me, that was an extra resource. That being said, your list includes entries that are essential (eg, XML signatures).

I actually still do not understand how the entries on the list above ended up as an extra resource as opposed to a resource. Ie, you are absolutely right: this deserves a much more precise definition

HadrienGardeur commented 8 years ago

Perhaps one thing we can usefully discuss is the difference between what we're calling resources and extra_resources or links. I've imagined resources as, in classic EPUB terms, all the stuff that's in the manifest but not the spine.

That's also my understanding of what "resources" are, since we've simply mentioned renaming "manifest" into something else, not changing its semantics.

But there seem to be several types of things in extra_resources or links. Hadrien's examples in the google doc include:

links to a search API

links to ODPS catalog feeds

links to encryption or XML signatures

true alternative versions of the publication (rendition?), such as a link to a classic EPUB version

Yes these are all good examples. You can identify what a link is about by looking at its rel and media type:

rel="search" for search, which could either point to an OpenSearch Document (application/opensearchdescription+xml) or directly to HTML and use a URI template
rel="record" or rel="related" depending on whether you're pointing to the OPDS entry for the current publication or just some related OPDS feed
for encryption and XML signatures, that's something that we've mentioned as a possibility but nothing really concrete has been discussed/decided at this point
rel="alternate" to link to a classic EPUB (with type set to application/epub+zip), but it's even more important to always include a rel="self" for the canonical locator to the manifest

links is meant to be used at a publication level like the link element in HTML or Atom, or the Link header in HTTP.

Separately, in the PWP work we've discussed that some publication resources may not be strictly essential to view the publication in one state or the other. For example, certain fonts may not be required to understand the content, or it may not be desirable to download a huge video file for offline viewing. Would such things be extra_resources?

There are two extension mechanism that we can rely on:

either declare a new relationship to identify such resources in our link object
or declare a new role for a collection

Using a collection makes sense when we need to group multiple link objects together (for example this is absolutely necessary for the spine since we need to order them) and/or provide metadata or specific semantic for that group. I believe that what you're describing here is a new collection with slightly different semantics than resources (manifest in classic EPUB) but you could also use properties to identify that difference and list them all under resources.

But what you're describing here is clearly not as generic as our use case for links (which is useful to list all resources associated to a specific collection).

HadrienGardeur commented 8 years ago

Right. And I continue to be against the usage of the terms for this. The term 'link' is overloaded:

We have link structures

We have the 'links' as part of a collection

We have then 'links' as part of the top level manifest

We only have one links that is meant to list link objects for a collection. In XML, there wouldn't be any difference between links and link objects (in EPUB 3.0.1 we use the link element for that), this is only necessary in JSON.

links as part of a collection is exactly the same as links for the top level manifest. We don't list the role of the top level manifest (could be called manifest or publication) but it follow the syntax of a collection too.

Once again, we're not inventing anything new here. "collection", "metadata" and "link" are all part of EPUB 3.0.1 with the exact same semantics and use cases. We're simply extending the use of collections to the whole manifest and not just a small part of it.

iherman commented 8 years ago

@HadrienGardeur:

Right. And I continue to be against the usage of the terms for this. The term 'link' is overloaded:
   We have link structures
   We have the 'links' as part of a collection
   We have then 'links' as part of the top level manifest
We only have one links that is meant to list link objects for a collection. In XML, there wouldn't be any difference between links and link objects (in EPUB 3.0.1 we use the link element for that), this is only necessary in JSON.

links as part of a collection is exactly the same as links for the top level manifest. We don't list the role of the top level manifest (could be called manifest or publication) but it follow the syntax of a collection too.

We agreed that the model described in and earlier comment is correct. According to that model:

Within the top level manifest (and probably in a rendition) what you call 'links' and what I call 'extra resources' identifies a collection through a specific role name
In the precise definition of a collection, the term 'link' is used to identify an array of link objects within a collection

These two notions are not the same, they are two, distinct concepts; the fact that they are different are actually the consequence of the proposal whereby 'everything is a collection'. We can come back on that approach (I am not yet, personally, convinced that this 'everything is a collection' is indeed a helpful approach), but until we remain within the model, this is the way it is. And, as a consequence, we are overloading the term 'links' in the model which, I believe, is bad practice.

We are repeating ourselves, though; I guess we have to agree that we disagree on that point; I remain opposed to the dual usage of the term in the model.

Once again, we're not inventing anything new here. "collection", "metadata" and "link" are all part of EPUB 3.0.1 with the exact same semantics and use cases. We're simply extending the use of collections to the whole manifest and not just a small part of it.

This is not really relevant. As you yourself said and proposed several times, the BFF work leads to a precise model of what an EPUB document is. Defining up that model may lead to terminological refinements, that is the goal of the whole exercise.

HadrienGardeur commented 8 years ago

You've said it yourself: what you call "extra resources" identifies a collection through a specific role name, therefore it's not the same as "links" (read my previous replies, I also suggested that this is a new collection role and therefore quite different from "links").

The top-level document is also a collection, it even follows the syntax for collections:

metadata
links
subcollections (spine, resources, renditions, extra_resources...)

We're not overloading "links", it's purpose remains exactly the same.

Your proposal even follows the standard extension mechanism for our model (new collection role).

iherman commented 8 years ago

@HadrienGardeur, we are talking by one another then.

In my description of the model:

manifest:
    spine               # provides the reading order
    resources?          # essential resources for the publication
    extra_resources?    # non-essential resources for the publication
    ...                 # extra collections, like alternative language, renditions, etc

extra resources is meant to be an extra (sub)collection, whose role name is, well, extra resources.

You then said in your comments

That said, your list is correct and the model that you're describing too except that there's already something defined for "extra_resources" in EPUB 3.0.1 and that's "links"

which could be misunderstood to indicate that 'links' should be used for that extra (sub)collection; this is certainly the way I understood. Whereas, in fact, you seem to want to refer to 'links' as the concept within a collection which is available for a manifest, because it is itself a collection. Ie, we have then two notions, namely 'links' and 'extra resources' (actually three, because 'resources' is also around) and, at this point, I have no idea what differentiates these from one another. Which gets us back to @dauwhe's comment whereby we have to describe what these really contain and mean, because it is certainly not clear in my, and I presume @dauwhe's mind either.

Well... the purpose of any model is to clarify terminologies and structures and provide an easy-to-follow common language; well, at this point, it has failed in our case. Maybe it is worth looking at this model again (eg, the 'everything is a collection' approach) which has clearly failed us. At least it has failed me.

HadrienGardeur commented 8 years ago

The initial comment about extra_resources (which is a new term and concept which we haven't used before) wasn't completely clear to me, but with further comments from both you and @dauwhe I now understand that you both mean a new collection role.

Keep in mind too that while you've proposed this idea, I'm not entirely convinced yet that we need it. At this point, I'm just figuring out how it would work with our existing model (works fine).

If we limit the model to what we've already defined, I can edit your model proposal:

manifest:
    metadata        # generic element to list metadata about a collection
    links           # generic element to list link objects associated to a collection
    spine           # provides the reading order
    resources?      # essential resources for the publication ("manifest" in classic EPUB)
    ...             # extra collections, like preview, distributable-object, index, etc

HadrienGardeur commented 8 years ago

I've updated the Gist to provide full examples based on what we've discussed:

{
  "@context": "http://idpf.org/epub.jsonld",

  "metadata": {
    "@type": "http://schema.org/Book",
    "identifier": "urn:isbn:9780000000001",
    "title": "Moby-Dick",
    "author": "Herman Melville",
    "language": "en",
    "publisher": "Whale Publishing Ltd.",
    "modified": "2016-02-18T10:32:18Z",
    "description": "Moby-Dick recounts the adventures of the narrator Ishmael as he sails on the whaling ship Pequod under the command of Captain Ahab."
  },

  "links": [
    {"rel": "self", "href": "http://example.org/bff.json", "type": "application/epub+json"},
    {"rel": "alternate", "href": "http://example.org/publication.epub", "type": "application/epub+zip"}
  ],

  "spine": [
    {"href": "cover.svg", "type": "image/svg+xml", "rel": "cover-image", "title": "Cover"},
    {"href": "chapter1.html", "type": "text/html", "title": "Chapter 1"},
    {"href": "chapter2.html", "type": "text/html", "title": "Chapter 2"}
  ],
  "resources": [{"href": "style.css", "type": "text/css"}]
}

dauwhe / epub31-bff

Simpler serialization for single-rendition publications #13