aerkalov / ebooklib

Python E-book library for handling books in EPUB2/EPUB3 format -
https://ebooklib.readthedocs.io/
GNU Affero General Public License v3.0
1.43k stars 222 forks source link

How to fetch all the metadatas ? #223

Open fgallaire opened 3 years ago

fgallaire commented 3 years ago

Reading an epub: How to fetch all the metadatas ? (something like a dict) Or at least the list of the names we need to use get_metadata()

aerkalov commented 3 years ago

They are on in a metadata dictionary. Just access book.metadata. They are grouped by namespaces. Values are in a list because you can have multiple values.

For instance. This is metadata in the book:

<package xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="id">
  <metadata>
    <dc:rights>Public domain in the USA.</dc:rights>
    <dc:identifier opf:scheme="URI" id="id">http://www.gutenberg.org/1342</dc:identifier>
    <dc:creator opf:file-as="Austen, Jane">Jane Austen</dc:creator>
    <dc:title>Pride and Prejudice</dc:title>
    <dc:language xsi:type="dcterms:RFC4646">en</dc:language>
    <dc:subject>England -- Fiction</dc:subject>
    <dc:subject>Young women -- Fiction</dc:subject>
    <dc:subject>Love stories</dc:subject>
    <dc:subject>Sisters -- Fiction</dc:subject>
    <dc:subject>Domestic fiction</dc:subject>
    <dc:subject>Courtship -- Fiction</dc:subject>
    <dc:subject>Social classes -- Fiction</dc:subject>
    <dc:date opf:event="publication">1998-06-01</dc:date>
    <dc:date opf:event="conversion">2021-02-10T19:00:08.880191+00:00</dc:date>
    <dc:source>https://www.gutenberg.org/files/1342/1342-h/1342-h.htm</dc:source>
    <meta name="cover" content="item1"/>
  </metadata>
...
</package>

This is what you get after parsing:

{
  'http://www.idpf.org/2007/opf': 
      {'cover': [(None, {'name': 'cover', 'content': 'item1'})]}, 
  'http://purl.org/dc/elements/1.1/': 
      {'rights': [('Public domain in the USA.', {})],
       'identifier': [('http://www.gutenberg.org/1342', {'{http://www.idpf.org/2007/opf}scheme': 'URI', 'id': 'id'})], 
        'creator': [('Jane Austen', {'{http://www.idpf.org/2007/opf}file-as': 'Austen, Jane'})], 
        'title': [('Pride and Prejudice', {})], 'language': [('en', {'{http://www.w3.org/2001/XMLSchema-instance}type': 'dcterms:RFC4646'})], 
        'subject': [('England -- Fiction', {}), ('Young women -- Fiction', {}), ('Love stories', {}), ('Sisters -- Fiction', {}), ('Domestic fiction', {}), ('Courtship -- Fiction', {}), ('Social classes -- Fiction', {})], 
        'date': [('1998-06-01', {'{http://www.idpf.org/2007/opf}event': 'publication'}), ('2021-02-10T19:00:08.880191+00:00', {'{http://www.idpf.org/2007/opf}event': 'conversion'})], 
        'source': [('https://www.gutenberg.org/files/1342/1342-h/1342-h.htm', {})]}, 'http://purl.org/dc/terms/': {}, 'http://www.w3.org/2001/XMLSchema-instance': {}}
fgallaire commented 3 years ago

It could be possible to have a programmer-friendly not shitty XML dict: Why not book.metadata["description"] instead book.metadata["http://purl.org/dc/elements/1.1/"]["description"] ? (And no more need of get_metadata(namespace, name))