dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
716 stars 136 forks source link

menus and navigation included with each chapter for novel bin #1345

Closed spg19132 closed 2 months ago

spg19132 commented 4 months ago

Describe the bug Both the top and bottom menu, as well as navigation links before and after chapter, are included with each chapter. To Reproduce Steps to reproduce the behavior:

  1. Go to https://novelbin.com/b/forgotten-legend-of-the-bloodied-flower#tab-chapters-title
  2. Click on "pack epub" (have the following options selected: "no Additional Metadata," "less tags," "Skip Images," and "Add Information page to Epub" selected)
  3. returns an ePub file with the extra described above.

Desktop (please complete the following information):

Thank you for your help :)

dteviot commented 4 months ago

@spg19132

Annoying. The chapters appear to be scattered across multiple hosts. Or, at least, multiple host aliases. Which has confused WebToEpub.

Hosts seem to be variation of: .novelcenter.net or .noveljar.org

Try adding pattern match for these hosts. Note, if doesn't work, might be easier to provide a script for https://github.com/dteviot/EpubEditor to clean up chapters after downloading.

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 27 minutes work

dteviot commented 4 months ago

@spg19132

Updated version (0.0.0.160) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.

swanknight commented 3 months ago

Try adding pattern match for these hosts. Note, if doesn't work, might be easier to provide a script for https://github.com/dteviot/EpubEditor to clean up chapters after downloading.

@dteviot The current version (0.0.0.160) didn't fix the issues for me. So I decided to make a script and run it after the fact, I only tested on one novel so far, but I'll leave it here in case it helps anyone in the future. This script will clean all the Novelbin garbage, and properly creates the chapter titles as best as it could. The script has been updated, thanks to @dteviot .

let wrapper = dom.querySelector("#wrapper");
let content = dom.querySelector("#chr-content");
let scripts = dom.getElementsByTagName("script");
let h3 = dom.querySelector("h3");
let p = dom.querySelectorAll('p');

// Replace wrapper with content if both exist
if (wrapper && content) {
    wrapper.replaceWith(content);
}

// Remove all inline styles from content
content?.removeAttribute('style');

// Remove all script elements
for (let i = scripts.length - 1; i >= 0; i--) {
    scripts[i].remove();
}

// Create chapter titles if the first line of the first paragraph is a chapter heading
if (p.length > 0) {
    let fline = p[0].textContent.trim();
    let chapterMissing = fline.match(/^\d/);

    if (fline.startsWith('Chapter') || chapterMissing) {
        let newElement = dom.createElement("h1");

        newElement.textContent = chapterMissing ? "Chapter " + fline : fline;
        p[0].replaceWith(newElement);
    }
}

// Convert h3 to h1 for consistency
if (h3) {
    let newElement = dom.createElement("h1");
    newElement.textContent = h3.textContent;
    h3.replaceWith(newElement);
}

// Clean miscellaneous content
for (let i = 0; i < p.length; ++i) {
    let str = p[i].textContent;

    if (str.includes('Transl') && str.includes('Edit')) {
        p[i].remove();
    } else if (p[i].textContent.includes('(.)')) {
        let index = str.lastIndexOf(". ");

        p[i].textContent = str.slice(0, index + 1);
    }
}

return true;
dteviot commented 3 months ago

@swanknight

Try https://github.com/dteviot/EpubEditor/blob/master/mutators/CleanNovelbin.js

Also, FYI, Javascript now has a better way of doing

    nav.parentNode.removeChild(nav);

try

    nav.remove();

https://developer.mozilla.org/en-US/docs/Web/API/Element/remove

dteviot commented 3 months ago

Note, problem is site keeps changing the name of the site for the chapters. This confuses WebToEpub so it doesn't select the correct parser to decode the pages. But I've got an idea how to fix.

Basically, set the chapter URLs to match the current host and rely on redirection to get to the wanted page.

swanknight commented 3 months ago

@swanknight

Try https://github.com/dteviot/EpubEditor/blob/master/mutators/CleanNovelbin.js

Also, FYI, Javascript now has a better way of doing

    nav.parentNode.removeChild(nav);

try

    nav.remove();

https://developer.mozilla.org/en-US/docs/Web/API/Element/remove

Thanks for the tip, as well as the clean script. 👌 I used your script, and added to it the extra stuff I wanted to fix. I also added comments to make it easier to understand the purpose of the extra blocks.

dteviot commented 3 months ago

@swanknight I see a couple more teachable moments.

if (content) {
    content.removeAttribute('style');
}

can be

    content?.removeAttribute('style');

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining

and

        parent.replaceChild(newElement, p[0]);

can be

        p[0].replaceWith(newElement);

https://developer.mozilla.org/en-US/docs/Web/API/Element/replaceWith

dteviot commented 3 months ago

Going to reopen, because problem has reappeared. Need to fix like

Note, problem is site keeps changing the name of the site for the chapters. This confuses WebToEpub so it doesn't select the correct parser to decode the pages. But I've got an idea how to fix. Basically, set the chapter URLs to match the current host and rely on redirection to get to the wanted page.

(PROBABLY BETTER SOLUTION) when get chapter list, register the hostnames as using the NovelBin parser

swanknight commented 3 months ago

@swanknight I see a couple more teachable moments.

Thanks, @dteviot sensei. 😁

dteviot commented 3 months ago

@spg19132 @swanknight

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:

For my notes: 47 minutes work

swanknight commented 3 months ago

Test versions for Firefox and Chrome have been uploaded For my notes: 47 minutes work

Tested with this novel, chapters 1 through 50. All good. 👍

image

dteviot commented 2 months ago

@swanknight @spg19132 Updated version (0.0.0.167) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.