JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
750 stars 161 forks source link

Less than sign in threadmark title on SpaceBattles not being escaped. #863

Closed HowardJeng closed 2 years ago

HowardJeng commented 2 years ago

It looks like theadmarks containing a less than sign aren't being properly escaped. If I point FanFicFare 4.13.0 being run as a calibre plug-in at https://forums.spacebattles.com/threads/princess-worm-rwby.761227/, which contains a threadmark titled "Akelarre <3 Neo, by Metaphorical Grapevine" in file OEBPS/file0088.xhtml, it inserts an unescaped <3 in the title tags, which causes errors that look like "error on line 4 at column 26: StartTag: invalid element name" The specific problematic post is https://forums.spacebattles.com/posts/59187214/

JimmXinu commented 2 years ago

I know that html entities in chapter titles was tested in past. Since then, BeautifulSoup has changed to be more 'helpful' about removing html entities when extracting text than it used to be. I vaguely recall that it's been an issue in other contexts.

Test versions up in the usual places.