JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
750 stars 161 forks source link

Quotation marks lost on siye.co.uk #887

Closed kpmgeek closed 1 year ago

kpmgeek commented 1 year ago

Quotation marks are dropped in fics from siye.co.uk. I believe this is a regression in the last few versions.

JimmXinu commented 1 year ago

I'm not seeing it. Story URL?

kpmgeek commented 1 year ago

Seen it across a number of stories recently. https://www.siye.co.uk/viewstory.php?sid=130652

Fanficfare 4.16.0 on Calibre 6.3, Linux x64.

JimmXinu commented 1 year ago

The site siye.co.uk reports it's pages as being iso-8859-1 encoded. Historically, most stories on it have actually been Windows-1252 encoded. That particular story is utf8 encoded, but some of surrounding HTML provided by the site is still iso-8859-1 (or Windows-1252), so strict utf8 decoding also fails.

One of the places where differences between iso-8859-1 / Windows-1252 / utf8 shows up is directional quotation marks.

FFF has a setting for utf8:ignore encoding to make it more forgiving. You can change the encoding for a particular story in personal.ini:

[https://www.siye.co.uk/viewstory.php?sid=130652]
website_encodings:utf8:ignore

Or for all stories on that site like this:

[www.siye.co.uk]
website_encodings:Windows-1252,utf8,utf8:ignore

The later will work in most, but not necessarily all, cases.

kpmgeek commented 1 year ago

Thanks, I suspected it was a UTF-8 issue. Will set that in the future.