aerkalov / ebooklib

Python E-book library for handling books in EPUB2/EPUB3 format -
https://ebooklib.readthedocs.io/
GNU Affero General Public License v3.0
1.49k stars 234 forks source link

Get page as image #291

Open Impre-visible opened 1 year ago

Impre-visible commented 1 year ago

Hi, I want to read an epub, and get all the pages as images. Is that possible ? I tried that, but don't work :

book = epub.read_epub(book_slug)
item = book.get_items()[int(page)]
content = item.get_content()
image_stream = io.BytesIO(content)
image_stream.seek(0)
return send_file(image_stream, mimetype="image/jpeg")
pbaletkeman commented 1 year ago

@Impre-visible you may require an external library such as Pillow (https://pillow.readthedocs.io/en/latest/handbook/index.html). I have used Pillow with ebooklib to extract all the images (including SVG) from the epub and save the files. Here are some examples which may help out https://stackoverflow.com/questions/68648801/generate-image-from-given-text

Impre-visible commented 1 year ago

I found a way to do that, I send it here in a few hours so you can see how I did

aerkalov commented 1 year ago

Wait, do you want to get HTML content of a pages as image or all images inside of the EPUB file?

If it is former it would probably be the best to unzip EPUB file to temp directory and use some of the HTML2IMAGE 3rd party libraries to create an image. Extracting to temp directory so the 3rd party tool has access to images and css files.

If it is latter then you do something like (very simple version which does not include some image post processing):

from ebooklib.utils import guess_type

for image in  book.get_items_of_type(ebooklib.ITEM_IMAGE):
     content_of_image = image.get_content()
     mt, en = guess_type(image.get_name())
     if mt:
         send_file(content_of_image, mimetype=mt)