alexadam / save-as-ebook

Save a web page/selection as an eBook (.epub format) - a Chrome/Firefox/Opera Web Extension
MIT License
1.1k stars 70 forks source link

Ruby rb tags ignored by extractHtml.js #54

Closed MichaelPetre closed 2 years ago

MichaelPetre commented 2 years ago

If you try to convert a Japanese webpage containing ruby tags, the rb tags are ignored by the parser.

<ruby><rb>私</rb><rp>(</rp><rt>わたくし</rt><rp>)</rp></ruby> gets saved as <ruby class="MG357"><rt class="WF360">わたくし</rt></ruby> As a result, you have the ruby furigana but are missing the kanji in the epub file. Expected output: わたくし Real output: わたくし

This is caused by line 15 of extractHtml.js: 'dfn', 'em', 'i', 'img', 'kbd', 'mark', 'q', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span',

Adding the rb tag solves the issue: 'dfn', 'em', 'i', 'img', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span',

MichaelPetre commented 2 years ago

Fixed in pull request #56