alexadam / save-as-ebook

Save a web page/selection as an eBook (.epub format) - a Chrome/Firefox/Opera Web Extension
MIT License
1.1k stars 70 forks source link

Handle special characters in title/text #10

Closed asifm91 closed 4 years ago

asifm91 commented 7 years ago

The add-on doesn't handle special characters properly. I was testing on a page whose title contains '&' character which was not escaped/converted. Therefore, generated xml files (content.opf, toc.ncx etc.) have become non-persible. Tested on Firefox 55 with official add-on.

asifm91 commented 6 years ago

I've looked into saveEbook.js and found the lines that can cause this issue.

Basically, contents of the variables ebookName and page.title need to be escaped properly. They occur on lines 71, 87, 93, 113 and 125.

I'm not sure about escaping page.content on line 116. Since it is already processed by a parser (in extractHtml.js), I guess escaping is not required. Can you please confirm this?

alexadam commented 6 years ago

I released a new version with a fix for this issue - 1.2.2, only in Chrome because I had a problem with ff. Can you check it? This is just a quick fix... the page.title was never escaped. I should add a pre-processing function just before saving. Thank you so much for your feedback and help! I'll address the other issues in the next days/weeks

asifm91 commented 6 years ago

I don't have Chrome installed at this moment. But looking at your commit changes, '&' issue should be fixed. However, there are other XML special characters which are not addressed:

Only '&', '<' and '>' needs to be encoded, single and double quotes are not that important. See here for details...

The getEbookFileName function should be changed accordingly.

Here are some suggestions regarding cross-platform file naming.