JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
758 stars 162 forks source link

HTML changes from previous chapters when updating epub #959

Closed nascalsn closed 1 year ago

nascalsn commented 1 year ago

Some changes in HTML processing are removing attributes from tags that they previously didn't remove (for example the attribute epub:type="noteref"), and also turning self closing tags into opening and closing tags.

JimmXinu commented 1 year ago

FFF runs the previous chapter HTML through BeautifulSoup to be able to do things like image and link processing.

Part of FFF's HTML processing is to remove HTML attributes that aren't explicitly allowed:

## Some attributes cause problems for EBook readers.  By default,
## FanFicFare will remove all attributes except the ones specified
## from all tags.  (The only exception is that <img> tags will also
## keep src, alt and longdesc attributes.  data-orighref is used by
## internalize_text_links to preserve links when chapters are
## inserted.)
## Example: To add 'style', 'title' and 'align' to the list to keep,
## in your personal.ini [defaults] put:
## add_to_keep_html_attrs:,style,title,align
keep_html_attrs:href,name,class,id,colspan,rowspan,data-orighref

You can put add_to_keep_html_attrs:,epub:type in your personal.ini.

BeautifulSoup has a list of which tags are allowed to be self closing and which aren't based on the HTML standards. If what those tags are has changed, that suggests that BS has changed.

Are you using the CLI or Calibre plugin version of FFF? What version of beautifulsoup4 do you have installed if CLI? What version of Calibre if plugin?

Can you provide before and after epubs that demonstrate this?