Closed hesoyamma785 closed 2 months ago
So, starting off with a slight correction: The placement of that text on sites like NovelBin is randomized. This is taken from Chrome Devtools for the chapter you were looking at above.
So it's always hidden somewhere, but it's different on each generation of the page.
There are ways to clean the code of stuff like this, but these sites are constantly changing their own design to get around these work-arounds. I've personally always just edited out the BS and watermarks after generation is complete. Heck, it's currently as easy as theoretically running $("span#span").remove();
before generation.
You'll notice, even in the string that they use special characters to avoid general discovery:
DiisCoover 𝒖pdated novels on n(o)v./e/lbin(.)co𝒎
- 𝒎 instead of m, etc.
I believe Calibre can handle this kind of batch editing, not 100% sure - never really used it. I generally use my C# project to whip up fixes when I need them, but I don't currently have it tuned for novelbin.
I'm hoping someone else can provide better details on how to remove it with existing tools.
@hesoyamma785 @Kiradien I'm thinking of adding code to the "EpubMerge" tool to clean this up. Basic idea:
Notes:
Looking at the actual HTML from site, the watermark is embedded in the content. However there's also a script to remove it. Something like
const original11Content = $(this).html();
const updated11Content = original11Content.replace("Visitt nov𝒆lbin(.)c𝒐/m for the l𝒂test updates", `<span id="span">Visitt nov𝒆lbin(.)c𝒐/m for the l𝒂test updates</span>`);
I have a question when I got the novel, this tags come as a text as you can see in the pictures i sent not as in sites that is sth like this span id="span"> it is for this site only or for all the sites that there is no of this thing span id="span">
@hesoyamma785
I'm having trouble understanding what you've written. So, I'll try and answer based on what I think you're asking.
@hesoyamma785
I'm having trouble understanding what you've written. So, I'll try and answer based on what I think you're asking.
1. I'm referring to the novelbin site, not any other. 2. The raw HTML for a page does NOT have the <span id="span"> element. Just the "naked" watermark text. 3. However, there is a <script> element in the HTML that converts the "raw" watermark text into a <span id="span"> element when the page is viewed. 4. WebToEpub doesn't view the page, so the watermark text remains in embedded in the content that WebToEpub packages into an epub.
yes its as you said
and I have another question what if i block the script using "ublock origin", will it also be removed from the epub file after doing this method or it is pointless?
@hesoyamma785
It's pointless, because WebToEpub does not run the script in the first place. That's why you see the "watermark" in the epub.
FWIW, I don't think you can block the script with ublock origin, because the script is within the HTML page itself. You'd need something like no-script. In which case, you'd see the watermark in the text if you viewed the site's chapters with a browser. Assuming the site will work with scripts disabled.
Results so far. It's not hard to find the line of text with the Novebin "watermark". The following seems to find nearly all of them
let text = node.data;
return ((text.normalize('NFKD') != text) && (text.includes("(")))
But finding where the wanted text ends and the Watermark begins is proving to be much more difficult. I'm thinking might need to have WebToEpub scan the <script> elements for the watermark text, and then use that to know what to remove. Which might be getting a bit to close for Google's rules.
Time spent: 130 minutes (so far).
@hesoyamma785
The watermarking seems to have stopped. The "original11Content.replace(" javascript is still there, and stylesheet element to hide the resulting <span>. but the text was empty.
I just tried:
Going to put this on hold. Please let me know if you see it again.
Time spent: 166 minutes (so far).
@hesoyamma785
The watermarking seems to have stopped. The "original11Content.replace(" javascript is still there, and stylesheet element to hide the resulting . but the text was empty.
I just tried:
- https://novelbin.com/b/civil-servant-in-romance-fantasy#tab-chapters-title, first five chapters
- https://novelbin.me/novel-book/this-female-celebrity-comes-from-the-cultivation-world#tab-chapters-title, all chapters
Going to put this on hold. Please let me know if you see it again.
Time spent: 166 minutes (so far).
I think they changed their method or sth like that cause if you look at this picture bellow there is the tag https://novelbin.com/b/cultivation-online-novel https://novelbjn.phieuvu.com/book/cultivation-online-novel/chapter-1596-primal-expanse
in the inspection page , i put the value of span visibility to 1 so that this could be shown and unlike previous situation there isn't <spaN command in the inspection page
@hesoyamma785
OK, I've just made a change. WebToEpub should now push the watermark into a <span> element with an id of "span", just like the site's javascript does when viewing in a browser. WebToEpub also marks the <span> as hidden, so MOST epub viewers should not show the element. (There's a few that don't know the hidden attribute.)
If that's a problem, you can use EpubEditor to remove these <span> elements. Script not supplied, but https://github.com/dteviot/EpubEditor/issues/4 should provide enough information to do it yourself.
Note, WebToEpub probably won't handle case when there is more than one different watermark on a page. (2nd and later watermarks will not be removed.) But since that's a rare case, I don't have an example of it to examine and figure out how to handle.
Test versions for Firefox and Chrome have been uploaded to https://github.com/dteviot/WebToEpub/releases/tag/developer-build. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes. Tested with:
Notes Time taken: 296 minutes (running total)
Thanks for your help and hard work ☺️
Reopen, so I know to notify you when Chrome and Firefox stores updated.
@hesoyamma785 Updated version (1.0.0.0) has been submitted to Firefox and Chrome stores. Firefox version is available now. Chrome might be available in a few hours to 21 days.
Describe the bug here is about noelbin.com that puts hidden tags in their texts . look at these pictures
To Reproduce Steps to reproduce the behavior:
Go to '...'https://novelbin.com/b/civil-servant-in-romance-fantasy#tab-chapters-titlehttps://novelbin.com/b/civil-servant-in-romance-fantasy#tab-chapters-title
https://lightnovel.novelupdates.net/book/civil-servant-in-romance-fantasy/cchapter-5-i-was-dispatched-2
Click on '....'
Scroll down to '....'
See error
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context Add any other context about the problem here.