dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
678 stars 132 forks source link

The contents of the chapters on the 69shuba.pro website are confused #1359

Closed LisArits closed 1 week ago

LisArits commented 2 months ago

https://www.69shuba.pro/book/57866/ When downloading chapters, the contents are mixed up. For example, chapter 104. Screenshot of the original

ск1 Continuation of the screenshot ск2

Screenshot of downloaded chapter 104

ск3

After the line "等众人过了那个劲,慢慢接受这破碎现实。" (it follows after the one highlighted in blue on the screenshot) the text starts to get confused and so on until chapter 110, I haven't checked further, but I think everything will be the same.

LisArits commented 2 months ago

I tried to copy manually, the chapters also get mixed up, it seems like there is some kind of copy protection. Is there a way to bypass it?

LisArits commented 2 months ago

Hmm, it's strange, today I tried to download chapters 104 to 262 and they seem to be normal, not mixed up. But in addition to this story, I also downloaded others, most often after downloading 100-200 chapters out of 600 (and more), the rest (201-600) are mixed up, this is probably some kind of copy protection and, as I wrote above, it is activated during any copying, be it software (web to epub) or manual (control + c). Is there any way to bypass this?

dteviot commented 2 months ago

@LisArits

I'm sorry, I don't know Chinese, so I can't tell what's been 'mixed up". Looking at the raw HTML, there's no immediately obvious tags to indicate what's scrambled or how.

X-Xadro commented 2 months ago

I just wanted to open an issue for this because i have the exact same problem, i tried downloading it through 'Lightnovel Crawler' and it had the same exact issue. MTLNation used to have this too when the site still existed.

It is most likely a script because when i block scripts on the site with uBlock it shows the exact same wrong order of paragraphs as when downloaded through WebToEpub same when looking through the page source through my browser.

gamebeaker commented 1 month ago

As an idea maybe we can create an iframe -> inject js through chrome.scripting -> load the chapter -> wait until the iframe is finished loading -> return dom -> destroy iframe That would be slow but could be a solution if js is used on the site to unscramble the content. Pro: the permission scripting is already in manifest.json

X-Xadro commented 1 month ago

Although not fully tested it seems at first glance that ever since they changed website again this protection feature is either removed or not on the story anymore i tested it on.

They definitely changed something big, because they are finally using <p></p> like most websites instead of before they used <br /><br />

Also to note, i had this problem only with a story that was being actively updated, other older stories which were completed were downloaded properly.

They probably installed flood protection instead of this scrambling, hence #1405 you could download from 69Shuba with no interval basically before.

gamebeaker commented 1 month ago

As an idea maybe we can create an iframe -> inject js through chrome.scripting -> load the chapter -> wait until the iframe is finished loading -> return dom -> destroy iframe That would be slow but could be a solution if js is used on the site to unscramble the content. Pro: the permission scripting is already in manifest.json

i just found this in #1162

It's a violation of the Chrome stores policy for an extension to bypass anti-copy measures. Doing so will get it banned. If you want to do this for yourself, I'd suggest you try opening a tab, and then inject a content script into the tab. https://developer.chrome.com/docs/extensions/develop/concepts/content-scripts

Originally posted by @dteviot in https://github.com/dteviot/WebToEpub/issues/1162#issuecomment-1890840147

gamebeaker commented 1 week ago

A couple days ago i re-downloaded the novel which i had issues with before because it had new chapters and just now I checked a couple chapters, including the one that was broken for me last time. And it was exactly like on site itself. So it seems to be 'fixed'

A while ago they also had a different protection where they pasted your IP address in the chapters if you used a downloader to get the story. So i guess they are experimenting or the site changes hands with every domain change lol

Originally posted by @X-Xadro in https://github.com/dteviot/WebToEpub/issues/1458#issuecomment-2328746573