Closed prydom closed 4 months ago
Note that my editor had added a missing import and cleaned up some trailing whitespace. If this is not desired I can rebase the diff, please let me know.
I'll try to take a look at this later today, thanks for the submission!
This is really nice! I have not run into the issue you mentioned, but this approach is much cleaner and is a great improvement. Thanks for the PR, really appreciate it!
One thing - can you bump the version in setup.py to 1.2.2? That's all this needs before I'm ready to merge :)
Done and rebased onto main.
The epub to text conversion was not extracting text in anything resembling chapter order on a few epubs that I tried it with.
The root cause is that the items listed in the manifest are not similar to the order listed in the book's spine.
Below is an excerpt of a
content.opf
file from one of the books. Notice how all the content inserts are listed before any chapters but should be interleaved between chapters. Also notice that the inserts are not even in order (in the below example the sequence is 6,1,2,3,4,5,7,8 in the manifest).To resolve this issue, we iterate though the content IDs in the spine and index into a dictionary indexing all manifest items of type
ebooklib.ITEM_DOCUMENT
.