Open kingjon3377 opened 3 years ago
These epub tools are designed for working with EPUB metadata that sits inside the <manifest>
section of an EPUB's manifest.xml
document.
EPUB metadata is defined by these schemas:
EPUB2: http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm
EPUB3: http://www.idpf.org/epub/30/spec/epub30-publications.html
The tag you're describing, meta name="calibre:series"
is not part of the specification of these book file formats. It's something added by the 3rd-party book management tool Calibre and isn't recognized by IDPF as part of the EPUB standard. I dont' know for sure but am guessing that this has been done by Calibre to an EPUB2 book which has no way to store this type of series information.
The right thing to do here (although I find it unlikely it will be done en-masse to existing EPUB2s) is for books to be converted to the EPUB3 format. This format has a very different and flexible system of allowing the definition of several title tags that are marked up with refinements. For example, one refinement is called title-type
and could be set to collection
which is like the case you're describing, a series of books that belong together).
If you're curious, here's the relevant specification for the title tagging for EPUB3: http://idpf.org/epub/30/spec/epub30-publications.html#sec-opf-dctitle
To be honest, my software isn't even parsing this level of complexity out of EPUB3 because hardly anyone was using it back when I was putting EPUB3 support in. It would be great if all this and more were added to epub-tools some day.
There are tools out there for converting from EPUB2 to EPUB3. I wonder if, when Calibre is given an EPUB3 file, it will fill in the correct tags for books in a series.
In the meantime, I'd this issue should stay open because this all definitely needs better EPUB3 support.
That's a reasonable perspective to take. However, a quick scan through the EPUB2 spec finds this passage:
One or more optional instances of a
meta
element, analogous to the XHTML 1.1meta
element but applicable to the publication as a whole, may be placed within themetadata
element or within the deprecatedx-metadata
element. This allows content providers to express arbitrary metadata beyond the data described by the Dublin Core specification.
The <meta name="calibre:series"
tag is within the <metadata>
tag.
(It looks like the Archive Of Our Own is using Calibre tooling to generate downloadable ebooks on demand, as the one I pointed to has this in its NCX: <meta content="calibre (3.39.1)" name="dtb:generator"/>
.)
(And FWIW, when Calibre converts this EPUB to an EPUB3, the series information is still represented using <meta
tags within the <metadata>
element, but now using a form that's not calibre:
namespaced and that more closely resembles what's shown in the EPUB3 spec:
<meta property="belongs-to-collection" id="id-3">New Hope</meta>
<meta refines="#id-3" property="collection-type">series</meta>
<meta refines="#id-3" property="group-position">2</meta>
But still not included in the default output of epubmeta
.)
Basically, what I expected was for the default output of epubmeta
to include some representation of everything under the <metadata>
element that I would see if I used epubmeta -e
. Special handling of common-extension cases like series name and position (e.g. turning calibre:series
Series Name
and calibre:series_index
1
into series: Series Name #1
) might be nice to have, but certainly isn't necessary so long as information from <meta>
tags isn't simply ignored.
Ok, you've convinced me. I think this should be done at some point. Plus I should fill out more of the EPUB3 refinements that are in the spec. It requires support in the epub-metadata library so depends now on https://github.com/dino-/epub-metadata/issues/12
Even with
-v
, the output ofepubmeta path/to/ebook.epub
does not include some pieces of metadata.For my particular use-case (identifying which ebooks downloaded from AO3 are in a series) I find that the information I want is in
<meta
tags with thename
field's value in thecalibre
namespace.For example, the OPF for the EPUB version of this story includes this element:
<meta name="calibre:series" content="New Hope"/>
So when I run
epubmeta
on it without-e
(and ideally without-v
), I would like to see a line likeseries: New Hope