aerkalov / ebooklib

Python E-book library for handling books in EPUB2/EPUB3 format -
https://ebooklib.readthedocs.io/
GNU Affero General Public License v3.0
1.5k stars 233 forks source link

write_epub() is deleting the css link #313

Closed manujchandra closed 5 months ago

manujchandra commented 5 months ago

Hi,

Upon experimentation, I have found that before epub.write_epub(output_file, new_book) the CSS link is present.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"
    xmlns:ev="http://www.w3.org/2001/xml-events"
    epub:prefix="media: http://idpf.org/epub/vocab/media/#">
    <head>
        <meta charset="utf-8" />
        <!-- note: the media vocabulary referenced does not (yet) exist, used
               as an indicator that ultimately for background tracks to work,
               a way to indicate nature is needed so that reading systems can
               integrate on/off, volume controls etc with UI.                           
        -->
        <title>The entire transcript</title>
        **<link rel="stylesheet" type="text/css" href="../css/shared-culture.css" />**
    </head>

But once this chapter is written to disk, using epub.write_epub(output_file, new_book) the CSS link is deleted.

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/#" lang="en" xml:lang="en">
  <head>
    <title>xhtml/p60</title>
  </head>

The full script I am using can be found here

So, my conclusion is that write_epub is deleting the css link.

Is there any way to ensure the CSS link is written to the disk? Thanks. Or is there something I am doing wrong? My intention is to take an xhtml from the epub, split it by size, ensuring that the styles are applied to each chunk, and save the epub part.

I am using this epub as a sample.

Thanks in advance.

SaschaUvA commented 5 months ago

Hi!

Unfortunately, the get_content() inside the write_epub() function is causing the information inside the <head> tag to be lost. I also incurred this issue, but found a workaround in another issue here: https://github.com/aerkalov/ebooklib/issues/221#issuecomment-1489878202

I hope this helps!

Best, Sascha

dajames-mk1 commented 5 months ago

Hi,

I found the same thing. The issue is that ebooklib writes its own<head> section to each HTML file, and this replaces any header you may have written yourself.

It's possible to specify a CSS file by adding an appropriate EpubItem to each chapter as you go. If you want to put the sytlesheet in a folder called styles, as is usual, you have to jump through a few hoops to get the name resolved correctly.

This is what I do. I create two EpubItems, one is used to get the style sheet included in the ebook, the other is to create the reference to it from each chapter.

# Create EpubItem object to include the CSS in the ebook

doc_style = epub.EpubItem(
    uid = "doc_style",
    file_name = "styles/my_styles.css",
    media_type = "text/css",
    content = open("../my_styles.css").read() )

# Add the style sheet to the book

journal_book.add_item(doc_style)

# ... and a second CSS object with a relative path to be referenced by
# chapters whose source is not in the EBOOK root (e.g. in text/).
# (we don't need any content, it's not going to be added to the document,
# it's just a holder for a relative path to the real CSS file)
#
# NOTE: This is very hacky and may one day stop working.

doc_style_ref = epub.EpubItem(
    uid = "doc_style",
    file_name = "../styles/my_styles.css",
    media_type = "text/css",
    content = '' )

Note the difference between the file_name fields of the two.

Then, as each chapter is being processed I have:

    chapter.add_item( doc_style_ref )

To add the object containing the relative path name to the chapter.

ebooklib doesn't have any special logic to handle relative paths.

I hope this is clear ... and useful!

manujchandra commented 5 months ago

I was able to implement this solution, which seems to be working for my use case.