Closed ThinksFast closed 2 months ago
@ThinksFast Thanks for bringing this to my attention! The issue has been fixed. #2bbe23c
@Ehsan-U Thanks for the quick fixes 🙂. I can confirm that the status code is getting correctly passed to the final response object.
But it looks like the javascript on the page is not getting rendered, and for the URL in the example above, the request is still getting blocked, even when I am not on a VPN.
Is there anything I can change in the configuration to render the JS, or improve my chances of not getting captcha checks?
@ThinksFast JS is being rendered properly, 403 ( captcha ) is appearing even when manually open the website in the chrome. so it seems like consistent captcha implementation by the site. even for real users.
@Ehsan-U Agreed on bypassing captchas, I'm sure no-driver won't get by all systems, but I'm hoping it's a lot better than Playwright.
But regarding the rendering of the HTML, when I print the HTML of the response object in the example code above, I get this in the logs:
<html>
<head>
<title>nodeposit365.com</title>
<style>
#cmsg {
animation: A 1.5s;
}
@keyframes A {
0% {
opacity: 0;
}
99% {
opacity: 0;
}
100% {
opacity: 1;
}
}
</style>
</head>
<body style="margin:0">
<p id="cmsg">Please enable JS and disable any ad blocker</p>
<script
data-cfasync="false">var dd = { 'rt': 'c', 'cid': 'AHrlqAAAAAMAGY6sUjlIluYA8DZhVQ==', 'hsh': '47FD5E27C3F4545A3AEA18602AAD93', 't': 'fe', 's': 16943, 'e': 'efdf718c9ab36b366ff60d84d929d6fc0e4986b45dc54b4c4ecba820da96d792', 'host': 'geo.captcha-delivery.com' }</script>
<script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script>
</body>
</html>
But when I open the page in Chrome browser, and copy the HTML from the dev tools inspector, I get this:
<html>
<head>
<title>nodeposit365.com</title>
<style>
#cmsg {
animation: A 1.5s;
}
@keyframes A {
0% {
opacity: 0;
}
99% {
opacity: 0;
}
100% {
opacity: 1;
}
}
</style>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body style="margin:0">
<script
data-cfasync="false">var dd = { 'rt': 'c', 'cid': 'AHrlqAAAAAMAizR_Iog0P0EAXoleBQ==', 'hsh': '47FD5E27C3F4545A3AEA18602AAD93', 't': 'fe', 's': 16943, 'e': '7c5c2cfb864f1a67075173025b3e05ff909bbf16202b7fb1e4284388bf5e45d7', 'host': 'geo.captcha-delivery.com' }</script>
<script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script><iframe
src="https://geo.captcha-delivery.com/captcha/?initialCid=AHrlqAAAAAMAizR_Iog0P0EAXoleBQ%3D%3D&hash=47FD5E27C3F4545A3AEA18602AAD93&cid=ThsqpJsRaY51M56Oa42aD9E4vVymnDcGm2ZR1DRQ764hJ5olzh4u_aC_SpvF7eJCyqElz77wdjvD44JJKacWpuEaXjLMzSPNERDo96zgywCGaxfMbab3EYL0LvdLDf0~&t=fe&referer=https%3A%2F%2Fwww.nodeposit365.com%2Fcasinos%2Fbc-game-casino%2F&s=16943&e=7c5c2cfb864f1a67075173025b3e05ff909bbf16202b7fb1e4284388bf5e45d7&dm=cd"
sandbox="allow-scripts allow-same-origin allow-forms" width="100%" height="100%" style="height:100vh;"
frameborder="0" border="0" scrolling="yes"></iframe>
</body>
</html>
The code is different, but notably, you can see the code printed in the logs says Please enable JS and disable any ad blocker
, while the code rendered in my real Chrome browser does not have that text. So I don't think I'm getting the rendered HTML, just the initial response.
I made a test spider to see how no-driver renders javascript content, and I'm seeing a strange issue where the original response gets a 403 status code, but the response object contains a 200 status code, and the HTML is raw / unrendered by chrome.
Here is the test scraper I wrote:
And here is the log output from running
scrapy crawl render_nodriver_test
in my terminal:You can see a 403 status code was received in the Response, but a 200 is reported in the
Response
object. I'm also printing the HTML, which you can see is full of capcha references. The HTML is also not javascript rendered.I'm using a VPN for this request, which is probably why a captcha is getting triggered, but the mix of status codes and the un-rendered HTML seems like separate issues.
Did I configure the spider and no-driver settings correctly?