Rule contribution (lightnovelworld | Slow update to avoid ban)

AurelionSoldMe commented 4 months ago

Just contributing a custom rule

{ "guid": "a804de22-86bb-4aac-a1bf-cae1addfc889", "rule_name": "lightnovelworld", "url_regex": "(?:https:\\/\\/www\\.lightnovelworld\\.com|.co\\/novel\\/([a-zA-Z0-9\\-\\/]+))", "pagetype_code": "return 0;", "toc_code": "injectJquery();\n\n// Extract the URL from the \"Novel Chapters\" link\nlet novelChaptersURL = $('a.grdbtn.chapter-latest-container').attr('href');\n\n// Fetch the content of the \"Novel Chapters\" page\nlet response = await fetch(novelChaptersURL);\n\n// Parse the HTML content\nlet htmlContent = await response.text();\n\n// Create a virtual DOM for parsing\nlet virtualDOM = $(htmlContent);\n\n// Extract base URL\nlet baseURL = window.location.origin;\n\n// Extract cover URL\nlet coverURL = virtualDOM.find('.novel-item .novel-cover img').attr('src');\n\n// Extract title\nlet title = virtualDOM.find('.novel-item h1 a.text2row').text().trim();\n\n//Summary\nlet summary = $('meta[itemprop=\"description\"]').attr('content');\n\n\n// Extract all chapter URLs\nlet chapterURLs = [];\n\n// Loop through each chapter list item\nvirtualDOM.find('.chapter-list li a').each(function () {\n let chapterLink = $(this).attr('href');\n // Prepend the base URL to the relative chapter URL\n chapterURLs.push(new URL(chapterLink, baseURL).href);\n});\n\n// Check if there's a next page for chapters\nlet nextPageLink = virtualDOM.find('.pagination-container a[rel=\"next\"]').attr('href');\n\n// If there's a next page, fetch its content and extract chapter URLs\nwhile (nextPageLink) {\n response = await fetch(nextPageLink);\n htmlContent = await response.text();\n virtualDOM = $(htmlContent);\n\n virtualDOM.find('.chapter-list li a').each(function () {\n let chapterLink = $(this).attr('href');\n // Prepend the base URL to the relative chapter URL\n chapterURLs.push(new URL(chapterLink, baseURL).href);\n });\n\n // Check if there's another next page\n nextPageLink = virtualDOM.find('.pagination-container a[rel=\"next\"]').attr('href');\n}\n\nlet retMe = {\n 'CoverURL': coverURL,\n 'Title': title,\n 'Summary': summary,\n 'ChapterCount': chapterURLs.length,\n 'ChapterURLs': chapterURLs,\n};\n\nreturn retMe;\n", "chapter_code": "// Ensure jQuery is injected\ninjectJquery();\n\n// Set to keep track of visited chapter URLs\nlet visitedChapterURLs = new Set();\n\nlet retMe;\n\n// Check if the current page is the Table of Contents (TOC) page\nif (isTOCPage()) {\n console.error(\"Error: Script is on the Table of Contents page. Not a chapter page.\");\n} else {\n let currentChapterURL = window.location.href;\n\n if (visitedChapterURLs.has(currentChapterURL)) {\n console.error(\"Error: Same chapter URL loaded twice. Breaking the script.\");\n }\n\n visitedChapterURLs.add(currentChapterURL);\n\n // Extract chapter title\n let chapterTitle = $('.chapter-title').text().trim();\n console.log('Chapter Title:', chapterTitle);\n\n // Scroll down to the bottom of the page\n await scrollDown();\n\n // Add an increased delay (15 seconds) to ensure content is loaded\n await sleep(8000);\n\n // Extract chapter content after scrolling\n let chapterContent = $('#chapter-container').html();\n console.log('Chapter Content:', chapterContent);\n\n // Get the URL of the next chapter\n let nextChapterPartialURL = getNextChapterURL();\n console.log('Next Chapter Partial URL:', nextChapterPartialURL);\n\n // Check if the next chapter URL is complete\n let nextChapterURL = null;\n if (nextChapterPartialURL) {\n // Get the base URL of the website\n let baseURL = window.location.origin;\n \n // Create the complete next chapter URL\n nextChapterURL = new URL(nextChapterPartialURL, baseURL).href;\n console.log('Next Chapter Complete URL:', nextChapterURL);\n } else {\n console.log('No Next Chapter Button found, or the next chapter has been visited before. It might be the last chapter.');\n }\n\n // Add a delay (8 seconds) after logging the next chapter URL\n await sleep(3000);\n console.log('Delay complete.');\n\n // Create the response object\n retMe = [\n {\n \"title\": chapterTitle,\n \"content\": chapterContent,\n \"nextURL\": nextChapterURL\n },\n ];\n}\n\nfunction isTOCPage() {\n return $('.toc-container').length > 0;\n}\n\n// Function to scroll down to the bottom of the page\nasync function scrollDown() {\n return new Promise(resolve => {\n let totalHeight = 0;\n let distance = 100;\n let scrollHeight = document.body.scrollHeight;\n\n let timer = setInterval(() => {\n window.scrollBy(0, distance);\n totalHeight += distance;\n\n if (totalHeight >= scrollHeight) {\n clearInterval(timer);\n resolve();\n }\n }, 50); // Reduced delay to 50 milliseconds\n });\n}\n\n// Function to get the URL of the next chapter\nfunction getNextChapterURL() {\n // Extract the URL of the next chapter link\n let nextChapterLink = $('a.button.nextchap');\n if (nextChapterLink.length > 0) {\n return nextChapterLink.attr('href');\n } else {\n // If no next chapter link is found, it means it's the last chapter\n // You can handle this case accordingly, for example, returning null or a special value\n return null;\n }\n}\n\n// Function for sleep/delay\nfunction sleep(ms) {\n return new Promise(resolve => setTimeout(resolve, ms));\n}\n\nreturn retMe;\n", "url_blocks": "" },

gmastergreatee commented 4 months ago

Try this and tell if this better

{
    "guid": "c6f1562f-2f0a-441b-861c-9255b7192455",
    "rule_name": "LightNovelWorld.com",
    "url_regex": "(?:https://)*(?:www\\.)*(lightnovelworld).com/novel/",
    "pagetype_code": "// https://www.lightnovelworld.com/novel/doomsday-spiritual-artifact-master-1676/chapter-5\n// https://www.lightnovelworld.com/novel/doomsday-spiritual-artifact-master\n\nif (document.title.toLowerCase().includes('just a moment')) {\n  return -2;\n}\n\nreturn 0;",
    "toc_code": "injectJquery();\n\nif ($('#chapter-list-page').length) {\n  // means this is TOC page\n  return {\n    retry: 1,\n    nextURL: $('.booktitle').attr('href')\n  };\n}\n\nif ($('.booktitle').length) {\n  // means this is chapter page\n  return {\n    retry: 1,\n    nextURL: $('.booktitle').attr('href')\n  };\n}\n\n// only allow to go past if it is novel page\nif ($('.chapter-latest-container').length <= 0) {\n  throw Error('The page-type couldn\\'t be determined. Please re-check the URL.');\n}\n\n// compile list of chapters\nlet chListURL = $('.chapter-latest-container')[0].href;\nlet toc1 = await fetch(chListURL).then(x => x.text());\nlet tocPageCount = $(toc1).find('.pagination-container li').length - 1;\nlet tocs = [toc1];\nfor (let i = 1; i < tocPageCount; i++) {\n    tocs.push(await fetch(chListURL + '?page=' + (i + 1)).then(x => x.text()));\n}\nlet chapterURLs = [];\ntocs.forEach(x => {\n  chapterURLs.push(...Array.from($(x).find('li[data-chapterno] a')).map(x => x.href))\n});\n\nlet retMe = {\n  'CoverURL': $('.glass-background img')[0].src,\n  'Title': $('h1').text().trim(),\n  'Summary': Array.from($('.summary .content > p')).map(x => $(x).text().trim()).join('<br><br>'),\n  'ChapterCount': chapterURLs.length,\n  'ChapterURLs': chapterURLs,\n};\n\nreturn retMe;",
    "chapter_code": "injectJquery();\n\n$('#chapter-container > div').remove();\n\nlet retMe = [\n  {\n    \"title\": $('.chapter-title').text().trim().replace(/^chapter\\s+/gi, ''),\n    \"content\": $('#chapter-container').html(),\n    \"nextURL\": \"\"\n  },\n];\n\nreturn retMe;",
    "url_blocks": "fonts.google\n/content/\nvntsm.com\ncriteo.net\ninmobi.com\ngoogletagmanager\nstatic.lightnovelworld.com/lib/"
  },

gmastergreatee commented 4 months ago

No need to wait 8 seconds or anything, most of the times the verification will be automatically bypassed. And if it doesn't for any reason, then just open Renderer and spam click the verification checkbox. I just downloaded 270 chapter novel without requiring to click anywhere.

AurelionSoldMe commented 4 months ago

I think the hella long wait time was cause not sure if it still does considering they did switch up the description on me is that it only downloaded some part of the novel due to dynamic loading.

AurelionSoldMe commented 4 months ago

10/10 just tested your rule now, way better than mine though gets caught by captcha every now and then still way faster.

No clue why my fast version would miss content or get hit by the 1 day timeout originally.

gmastergreatee commented 4 months ago

ok thanks for reporting, added the rule in the rules.json file

gmastergreatee / Fanfiction-Manager

Rule contribution (lightnovelworld | Slow update to avoid ban) #20