Open fgallaire opened 3 years ago
They are on in a metadata dictionary. Just access book.metadata
. They are grouped by namespaces. Values are in a list because you can have multiple values.
For instance. This is metadata in the book:
<package xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="id">
<metadata>
<dc:rights>Public domain in the USA.</dc:rights>
<dc:identifier opf:scheme="URI" id="id">http://www.gutenberg.org/1342</dc:identifier>
<dc:creator opf:file-as="Austen, Jane">Jane Austen</dc:creator>
<dc:title>Pride and Prejudice</dc:title>
<dc:language xsi:type="dcterms:RFC4646">en</dc:language>
<dc:subject>England -- Fiction</dc:subject>
<dc:subject>Young women -- Fiction</dc:subject>
<dc:subject>Love stories</dc:subject>
<dc:subject>Sisters -- Fiction</dc:subject>
<dc:subject>Domestic fiction</dc:subject>
<dc:subject>Courtship -- Fiction</dc:subject>
<dc:subject>Social classes -- Fiction</dc:subject>
<dc:date opf:event="publication">1998-06-01</dc:date>
<dc:date opf:event="conversion">2021-02-10T19:00:08.880191+00:00</dc:date>
<dc:source>https://www.gutenberg.org/files/1342/1342-h/1342-h.htm</dc:source>
<meta name="cover" content="item1"/>
</metadata>
...
</package>
This is what you get after parsing:
{
'http://www.idpf.org/2007/opf':
{'cover': [(None, {'name': 'cover', 'content': 'item1'})]},
'http://purl.org/dc/elements/1.1/':
{'rights': [('Public domain in the USA.', {})],
'identifier': [('http://www.gutenberg.org/1342', {'{http://www.idpf.org/2007/opf}scheme': 'URI', 'id': 'id'})],
'creator': [('Jane Austen', {'{http://www.idpf.org/2007/opf}file-as': 'Austen, Jane'})],
'title': [('Pride and Prejudice', {})], 'language': [('en', {'{http://www.w3.org/2001/XMLSchema-instance}type': 'dcterms:RFC4646'})],
'subject': [('England -- Fiction', {}), ('Young women -- Fiction', {}), ('Love stories', {}), ('Sisters -- Fiction', {}), ('Domestic fiction', {}), ('Courtship -- Fiction', {}), ('Social classes -- Fiction', {})],
'date': [('1998-06-01', {'{http://www.idpf.org/2007/opf}event': 'publication'}), ('2021-02-10T19:00:08.880191+00:00', {'{http://www.idpf.org/2007/opf}event': 'conversion'})],
'source': [('https://www.gutenberg.org/files/1342/1342-h/1342-h.htm', {})]}, 'http://purl.org/dc/terms/': {}, 'http://www.w3.org/2001/XMLSchema-instance': {}}
It could be possible to have a programmer-friendly not shitty XML dict:
Why not book.metadata["description"]
instead book.metadata["http://purl.org/dc/elements/1.1/"]["description"]
?
(And no more need of get_metadata(namespace, name)
)
Reading an epub: How to fetch all the metadatas ? (something like a dict) Or at least the list of the names we need to use get_metadata()