dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.
https://pypi.org/project/lightnovel-crawler/
GNU General Public License v3.0
1.44k stars 281 forks source link

Fix this source: ranobes.top #2137

Open archziac opened 12 months ago

archziac commented 12 months ago

Not able to download from this source

camp00000 commented 7 months ago

The downloader seems to break due to an automatic google re-captcha that pops up once too many requests have been made / automised requests are detected.

This happens when it tries to get the chapter data as it has to, for some novels, load over 100 pages containing chapter links because no single page has all that data it seems.

I haven't found any bypass of recaptcha in any other sources, at most somewhat feasible cloudflare bypasses via selenium but nothing that covers google's recaptcha.

@dipu-bd do you know of any similar cases where recaptcha was bypassed? Or do you think implementing something like this (although not updated for 3 years...) is worth a shot?

HTML of such a recaptcha case looks like this


<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="unsafe-url" name="referrer"/>
<meta content="true" name="HandheldFriendly"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="user-scalable=no, initial-scale=1.0, maximum-scale=1.0, width=device-width" name="viewport"/>
<meta content="yes" name="apple-mobile-web-app-capable"/>
<meta content="default" name="apple-mobile-web-app-status-bar-style"/>
<title>Ranobes Flood Guard</title>
<style> .info{text-align: center; max-width: 630px; margin: 10% auto; font-size: 18px; font-family: Helvetica, "Trebuchet MS", Verdana, sans-serif;}.antibot-btn-success{width: 300px; line-height: 3.5; margin: 20px auto 20px auto; font-size: 16px; font-weight: 600; color: #fff; cursor: pointer;
 height: 55px; text-align:center; border: none; background-size: 300% 100%; border-radius: 50px; moz-transition: all .4s ease-in-out; -o-transition: all .4s ease-in-out; -webkit-transition: all .4s ease-in-out; transition: all .4s ease-in-out;}.antibot-btn-success:hover{background-position: 10
0% 0; moz-transition: all .4s ease-in-out; -o-transition: all .4s ease-in-out; -webkit-transition: all .4s ease-in-out; transition: all .4s ease-in-out;}.antibot-btn-success:focus{outline: none;}#content > .antibot-btn-success{background-image: linear-gradient( to right, #0ba360, #3cba92, #30d
d8a, #2bb673 ); box-shadow: 0 4px 15px 0 rgba(23, 168, 108, 0.75);}.antibot-btn-color{cursor: pointer; padding: 14px 14px; text-decoration: none; display: inline-block; width: 14px; height: 16px;}.antibot-btn-color:hover{border: 2px solid #ccc; width: 10px; height: 10px;}
html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt, var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, tabl
e, caption, tbody, tfoot, thead, tr, th, td,article, aside, canvas, details, embed, figure, figcaption, footer, header, hgroup, menu, nav, output, ruby, section, summary, time, mark, audio, video{margin: 0; padding: 0; border: 0; font-size: 100%; font: inherit; vertical-align: baseline}article
, aside, details, figcaption, figure, footer, header, hgroup, menu, nav, section{display: block}body{line-height: 1}ol, ul{list-style: none}blockquote, q{quotes: none}blockquote:before, blockquote:after, q:before, q:after{content: ''; content: none}table{border-collapse: collapse; border-spaci
ng: 0}article,aside,details,figcaption,figure,footer,header,hgroup,nav,section,summary{display: block;}body, html{height: 100%;}html, html a{-webkit-font-smoothing: antialiased;text-shadow: 1px 1px 1px rgba(0,0,0,0.004);} body, select, input, textarea, button{font: normal 14px/1.5 "GothaPro", 
Arial, Helvetica, sans-serif;letter-spacing: 0.012em;color: #ccc;outline: none;}a{outline: none; color: #4e8cda; text-decoration: none;}a:hover{text-decoration: underline;}a img{border: 0 none;}a > img{vertical-align: bottom;}b{font-weight: bold;}.clr{clear: both}.clrfix:after{clear: both; con
tent: ""; display: block; height: 0; width: 0; visibility: hidden}body{background: #1a1a1a;}.offpage{display: table; position: absolute; left: 0; top: 0; width: 100%; height: 100%;}.wrap{display: table-cell;vertical-align: middle;text-align: center;padding: 20px;}.wrap_in{max-width: 600px; mar
gin: 0 auto;}.wrap .title{font-size: 2em;display: block;margin: 1.5em 0 1.5em 0;font-weight: bold;}.logo{text-align: center;}.logo > img{max-width: 130px; vertical-align: top;}.footer{display: table-row;}.copyright{display: table-cell;height: 1%;padding: 20px;line-height: 20px;text-align: cent
er;font-size: .8em;background-color: #181818;color: #999;}.copyright a{color: #999;}.copyright a:hover{color: inherit;}</style>
</head>
<body>
<div class="offpage">
<div class="wrap">
<div class="wrap_in">
<div class="title">Antiflood</div>
<div class="reason">
<form method="post">
<p>Hello, dear visitor!<br/>Our system has detected abnormal activity from your IP address<br/>Wait for (1-4 seconds) <b> Google reCAPTCHA</b> version 3 to appear, and please confirm that you are not a robot</p>
<div class="info" id="content">
<input id="g-recaptcha-response" name="g-recaptcha-response" type="hidden" value=""/>
<button class="antibot-btn-success" name="submit" style="cursor: pointer;" type="submit">I'm not a robot.</button>
</div>
</form>
</div>
</div>
</div>
<div class="footer">
<p class="copyright"><strong>Ranobes</strong> online</p>
</div>
</div>
<script src="/engine/classes/js/jquery.js?v=af30c1"></script>
<script src="https://www.google.com/recaptcha/api.js?render=6Lf0EBUlAAAAAO4CpepmrVgl0pkasGlCJeETtAsP"></script>
<script>grecaptcha.ready(function() {grecaptcha.execute('6Lf0EBUlAAAAAO4CpepmrVgl0pkasGlCJeETtAsP', {action: 'pm'}).then(function(token) {$('#g-recaptcha-response').val(token);});});</script>
<script>(function(){var js = "window['__CF$cv$params']={r:'85165f24ca97698a',t:'MTcwNzI1Mzc0MC41MDYwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = docu
ment.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWin
dow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var
 prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script></body>
</html>
zGadli commented 7 months ago

@camp00000 This repo uses undetected-chrome driver to bypass captchas and bot detection. But ranobes hate it when there are bots on their site. They even created their custom captcha for bot detection instead of traditional captcha etc. I would recommend @archziac to find another source that works.

camp00000 commented 7 months ago

Thanks for the info @zGadli , in that case I won't look into this any further.