aerkalov / ebooklib

Python E-book library for handling books in EPUB2/EPUB3 format -
https://ebooklib.readthedocs.io/
GNU Affero General Public License v3.0
1.49k stars 234 forks source link

Handle missing title from table of contents nodes that have children #278

Closed samuelclay closed 4 months ago

samuelclay commented 1 year ago

Using lxml's .text_content() instead of text because some ToC nodes look like this:

<li>
    <a href="text/chapter-1.xhtml"><span epub:type="z3998:roman">I</span>: Looking-Glass House</a>
</li>

This PR fixes that issue by using lxml's nested text extractor instead of using the root node's text alone.

aerkalov commented 4 months ago

Thanks for this!