aerkalov / ebooklib

Python E-book library for handling books in EPUB2/EPUB3 format -
https://ebooklib.readthedocs.io/
GNU Affero General Public License v3.0
1.47k stars 229 forks source link

Saving with `set_content` produces empty HTML #300

Open netw0rkf10w opened 8 months ago

netw0rkf10w commented 8 months ago

Hello,

First of all thank you so much for your great work!

I have been trying your library to make changes to an existing ePub, but for some reason, the saved file contains empty HTML:

import os
import argparse
from bs4 import BeautifulSoup
import ebooklib
from ebooklib import epub

def modify_epub(file_name, output):
    book = epub.read_epub(file_name)

    for item in book.get_items():
        if item.get_type() == ebooklib.ITEM_DOCUMENT:
            # soup = BeautifulSoup(item.get_content(), 'html.parser')
            soup = BeautifulSoup(item.get_content(), 'lxml')
            item.set_content(str(soup))

    output = os.path.expanduser(output)
    if os.path.exists(output):
        print(f'Removing existing file before saving: {output}')
        os.remove(output)
    epub.write_epub(output, book)

It seems that the issue lies at the line item.set_content(str(soup)). Could you please tell me what's wrong?

Thank you so much in advance for your help!

c1924959470 commented 8 months ago

   您好!您的来信我已接受,我会尽快回复您。