Closed spg19132 closed 2 months ago
@spg19132
Annoying. The chapters appear to be scattered across multiple hosts. Or, at least, multiple host aliases. Which has confused WebToEpub.
Hosts seem to be variation of: .novelcenter.net or .noveljar.org
Try adding pattern match for these hosts. Note, if doesn't work, might be easier to provide a script for https://github.com/dteviot/EpubEditor to clean up chapters after downloading.
Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:
For my notes: 27 minutes work
@spg19132
Updated version (0.0.0.160) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.
Try adding pattern match for these hosts. Note, if doesn't work, might be easier to provide a script for https://github.com/dteviot/EpubEditor to clean up chapters after downloading.
@dteviot The current version (0.0.0.160) didn't fix the issues for me. So I decided to make a script and run it after the fact, I only tested on one novel so far, but I'll leave it here in case it helps anyone in the future. This script will clean all the Novelbin garbage, and properly creates the chapter titles as best as it could. The script has been updated, thanks to @dteviot .
let wrapper = dom.querySelector("#wrapper");
let content = dom.querySelector("#chr-content");
let scripts = dom.getElementsByTagName("script");
let h3 = dom.querySelector("h3");
let p = dom.querySelectorAll('p');
// Replace wrapper with content if both exist
if (wrapper && content) {
wrapper.replaceWith(content);
}
// Remove all inline styles from content
content?.removeAttribute('style');
// Remove all script elements
for (let i = scripts.length - 1; i >= 0; i--) {
scripts[i].remove();
}
// Create chapter titles if the first line of the first paragraph is a chapter heading
if (p.length > 0) {
let fline = p[0].textContent.trim();
let chapterMissing = fline.match(/^\d/);
if (fline.startsWith('Chapter') || chapterMissing) {
let newElement = dom.createElement("h1");
newElement.textContent = chapterMissing ? "Chapter " + fline : fline;
p[0].replaceWith(newElement);
}
}
// Convert h3 to h1 for consistency
if (h3) {
let newElement = dom.createElement("h1");
newElement.textContent = h3.textContent;
h3.replaceWith(newElement);
}
// Clean miscellaneous content
for (let i = 0; i < p.length; ++i) {
let str = p[i].textContent;
if (str.includes('Transl') && str.includes('Edit')) {
p[i].remove();
} else if (p[i].textContent.includes('(.)')) {
let index = str.lastIndexOf(". ");
p[i].textContent = str.slice(0, index + 1);
}
}
return true;
@swanknight
Try https://github.com/dteviot/EpubEditor/blob/master/mutators/CleanNovelbin.js
Also, FYI, Javascript now has a better way of doing
nav.parentNode.removeChild(nav);
try
nav.remove();
https://developer.mozilla.org/en-US/docs/Web/API/Element/remove
Note, problem is site keeps changing the name of the site for the chapters. This confuses WebToEpub so it doesn't select the correct parser to decode the pages. But I've got an idea how to fix.
Basically, set the chapter URLs to match the current host and rely on redirection to get to the wanted page.
@swanknight
Try https://github.com/dteviot/EpubEditor/blob/master/mutators/CleanNovelbin.js
Also, FYI, Javascript now has a better way of doing
nav.parentNode.removeChild(nav);
try
nav.remove();
https://developer.mozilla.org/en-US/docs/Web/API/Element/remove
Thanks for the tip, as well as the clean script. 👌 I used your script, and added to it the extra stuff I wanted to fix. I also added comments to make it easier to understand the purpose of the extra blocks.
@swanknight I see a couple more teachable moments.
if (content) {
content.removeAttribute('style');
}
can be
content?.removeAttribute('style');
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining
and
parent.replaceChild(newElement, p[0]);
can be
p[0].replaceWith(newElement);
https://developer.mozilla.org/en-US/docs/Web/API/Element/replaceWith
Going to reopen, because problem has reappeared. Need to fix like
Note, problem is site keeps changing the name of the site for the chapters. This confuses WebToEpub so it doesn't select the correct parser to decode the pages. But I've got an idea how to fix. Basically, set the chapter URLs to match the current host and rely on redirection to get to the wanted page.
(PROBABLY BETTER SOLUTION) when get chapter list, register the hostnames as using the NovelBin parser
@swanknight I see a couple more teachable moments.
Thanks, @dteviot sensei. 😁
@spg19132 @swanknight
Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:
For my notes: 47 minutes work
Test versions for Firefox and Chrome have been uploaded For my notes: 47 minutes work
Tested with this novel, chapters 1 through 50. All good. 👍
@swanknight @spg19132 Updated version (0.0.0.167) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.
Describe the bug Both the top and bottom menu, as well as navigation links before and after chapter, are included with each chapter. To Reproduce Steps to reproduce the behavior:
Desktop (please complete the following information):
Thank you for your help :)