Closed mjfeintuch closed 5 years ago
I think it might be related to the typing speed. Have you tried to lower the speed of typing? I would also try to implement random delays between actions like clicking the submit button after filling out the form.
I'm being detected by Distil without even typing anything, just trying to see a website.
@bjesus did u use puppetter stealth plugin or not. And by the way distill network can use many heuristics to detect bot so I don't think it is easy to find that heuristic.
Hi @shirshak55, yes, I did use the stealth plugin of course, otherwise I wouldn't report it here :)
I have same issue.
check if real browser has also same issue or not. They may be using ip to check bot activity guys. And did u installed any other extensions, additional fonts etc? And there is no chance for using proxy like that of luminati because distill networks has list of all proxies etc and they can easily identify u are using bot.
In normal browser it just display their site.
I'm not using any proxy. I didn't install any other packages. I even didn't go to https://www.distilnetworks.com/ programically but manually typing url in puppeteer chromium window. They instantly detect it somehow.
@BorysTyminski using chromium?
And do they detect one time only or each time. Because sometime due to new brand new fresh browser they may be suspicious . And save user data folder so they people u are same user.
Doesn't puppeteer use chromium? I set headless: false,
and I just paste url and it instantly detected me. I didn't open this site never with puppeteer without stealth plugin so I doubt they saved me as a suspicious user.
@BorysTyminski we can use chrome.
And if u are using chromium be sure to change user agent. which url give me i would like to test if it detects me or not.
https://www.distilnetworks.com/ let me know if they detect you as well maybe I'm doing something wrong. In my case it looks like this:
I dig a bit in their website source and I think this is their test which we are failing. However it is minified and probably it's just a bundle so it's hard to understand but this methods names are still meaningful.
Also on this site in the console in puppeteer I have some error which I don't have in normal browser:
VM226:14 Uncaught TypeError: getParameter is not a function at WebGLRenderingContext.getParameter (
:14:18) at a.getWebglFp (zhrodsadknkfnugjasbebzzfzafscewueq.js:1) at a.webglKey (zhrodsadknkfnugjasbebzzfzafscewueq.js:1) at a.interrogate (zhrodsadknkfnugjasbebzzfzafscewueq.js:1) at zhrodsadknkfnugjasbebzzfzafscewueq.js:1
and warning:
The AudioContext was not allowed to start. It must be resumed (or created) after a user gesture on the page.
@shirshak55 did you check it? Do they detect you as well?
they deected on that website only other not detected. it has to do with canvas probably finger printing issue
On Wed, May 29, 2019, 10:27 PM BorysTyminski notifications@github.com wrote:
@shirshak55 https://github.com/shirshak55 did you check it? Do they detect you as well?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berstend/puppeteer-extra/issues/33?email_source=notifications&email_token=AB5Y4YJOW2RKLZYCFJT446LPX2W6BA5CNFSM4HEKUEAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWP5Y6A#issuecomment-497015928, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5Y4YNSNLMQFDDZIF7AOM3PX2W6BANCNFSM4HEKUEAA .
What do you mean with:
canvas probably finger printing issue
??
I checked
document.createElement('canvas').getContext('webgl').getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
like this:
var canvas = document.createElement('canvas');
var gl = canvas.getContext('webgl');
var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
but vendor
and renderer
is just fine:
I found something interesting when following this error on this site:
@BorysTyminski why not run on real browser and use devtool protocol to control it?
@shirshak55 so if I'll use chrome instead of chromium distill will not detect puppeteer? That's what you mean?
@BorysTyminski try this https://github.com/shirshak55/scrapper-tools/blob/master/src/fastPage.ts#L81
And please open from other url protected by distil network to ensure.
Why not trying to solve the captcha?
because page dont even load right?
On Mon, Jun 10, 2019, 3:07 AM Eastkap notifications@github.com wrote:
Why not trying to solve the captcha?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berstend/puppeteer-extra/issues/33?email_source=notifications&email_token=AB5Y4YO3XWK2OSIQD6VMCZLPZVYCFA5CNFSM4HEKUEAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXISQPY#issuecomment-500246591, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5Y4YNUGYODJ52B66Z5FOLPZVYCFANCNFSM4HEKUEAA .
How you want to solve google recaptcha with 100% efficiency without human ingeration? It's much easier to find how they detect webdriver than create such AI.
Hey, just thought I would share my thoughts on this. having read about their web bot mitigation product on their site (which I assume is what is being discussed here), if they are not lying/exaggerating about the features of their products, this is how it works:
the JS SDK does "Hi-Def fingerprinting analyzes over 200 device attributes" which is then sent to and digested by distil networks backend, and this fingerprint is compared against the attributes your browser is expected to have (e.g. IE does not support web push notifications, however chromium does. so if you are setting your useragent to IE, you need to disable support for web push). so start with chromium masquerading as chrome (inject all the basic stuff like languages etc), or even point your puppetter to a chrome instance (should work almost the same way). see if they still detect you. I tried this method against google recaptcha V3 and it worked just fine.
@BorysTyminski use 2captchas like this. https://github.com/shirshak55/scrapper-tools/blob/master/src/fastPage.ts#L34
Hey, just thought I would share my thoughts on this. having read about their web bot mitigation product on their site (which I assume is what is being discussed here), if they are not lying/exaggerating about the features of their products, this is how it works:
the JS SDK does "Hi-Def fingerprinting analyzes over 200 device attributes" which is then sent to and digested by distil networks backend, and this fingerprint is compared against the attributes your browser is expected to have (e.g. IE does not support web push notifications, however chromium does. so if you are setting your useragent to IE, you need to disable support for web push). so start with chromium masquerading as chrome (inject all the basic stuff like languages etc), or even point your puppetter to a chrome instance (should work almost the same way). see if they still detect you. I tried this method against google recaptcha V3 and it worked just fine.
wonder what do you mean by "so start with chromium masquerading as chrome (inject all the basic stuff like languages etc), or even point your puppetter to a chrome instance (should work almost the same way)"? Are there any configuration outside of this + useragent to make puppeteer copy chrome? Thank you
@keimao hey u can run real instance of chrome and grab the websocket url and use cdp :)
Hey, just thought I would share my thoughts on this. having read about their web bot mitigation product on their site (which I assume is what is being discussed here), if they are not lying/exaggerating about the features of their products, this is how it works: the JS SDK does "Hi-Def fingerprinting analyzes over 200 device attributes" which is then sent to and digested by distil networks backend, and this fingerprint is compared against the attributes your browser is expected to have (e.g. IE does not support web push notifications, however chromium does. so if you are setting your useragent to IE, you need to disable support for web push). so start with chromium masquerading as chrome (inject all the basic stuff like languages etc), or even point your puppetter to a chrome instance (should work almost the same way). see if they still detect you. I tried this method against google recaptcha V3 and it worked just fine.
wonder what do you mean by "so start with chromium masquerading as chrome (inject all the basic stuff like languages etc), or even point your puppetter to a chrome instance (should work almost the same way)"? Are there any configuration outside of this + useragent to make puppeteer copy chrome? Thank you
I personally do not use this plugin directly, I use it as a reference mostly to make sure I am not missing something you guys have thought of. with that said, I dont believe this plugin can solve this issue all by itself (I could be wrong tho).
the way distilnetworks performs its checks is as follow: (this is how their main site's validation works, they might have different "delivery" methods of performing this validation but they will all check the same things at the end)
now digging a bit deeper into the JS, I stumbled upon
audioKey:function(e){return this.options.excludeAudio?e:(e.audio=this.getAudio(),e)},getAudio:function(){var e=document.createElement("audio"),t=!1;return(t=!!e.canPlayType)&&(t=new Boolean(t),t.ogg=e.canPlayType('audio/ogg; codecs="vorbis"')||"nope",t.mp3=e.canPlayType("audio/mpeg;")
they are checking if the browser can play certain media types or not. the right answer depends on the user agent you are using, let say hypothetically chrome v71 cant play audio/ogg but v72.1 can... you need to make sure your browser's features match what is expected from the browser and its version and the above code is just a snippet of what they are checking.
so to answer your question
Are there any configuration outside of this + useragent to make puppeteer copy chrome?
it depends on what user agent you are using. thats why I suggested to point your puppeteer to a real chrome instance and dont lie about your useragent. if it works then you could try and change the temp directory to get a fresh instance of chrome (to avoid cookies and other stuff being shared across instances) and see if its still working. and based on your needs you can start lying about small things in your user agent.
I dont believe this plugin can solve this issue all by itself (I could be wrong tho).
@ahoura so why you think that this plugin can not bypass their antibot system? If I understood correctly your reply, we just need to match all chrome/chromium data (settings, window properties etc) with crafted user-agent. So currently something is inconsistent, for example (hypothetically) we have user-agent of chrome v71 and we can play audio/ogg and by detecting the inconsistence they lead us to recaptcha, right?
@BorysTyminski actually this plugin has not been updated for like 4 months and its not bad to say that developer from distil network has already seen such type of plugin so they must already have knowledge about flaw.
So the best solution is use ur real browser and make it to do automation etc. Doing so will make them never know u are using real browser or bot. And u can also automate captcha with services like captchas or not.
And finger printing is a very good issue. Lets say u always use fresh instance then also distill network might know u are a bot because they may have system like somebody is coming from fresh instance(no cookies/sessions same canvas fonts etc.) always and from same ip so it should be bot although the browser looks like real chrome (not puppeteer chrome) etc..
And I think this bot detection has become like a cat and mouse game and I think in future there is no way distill network can detect the browser is bot or not :)
And I think this bot detection has become like a cat and mouse game and I think in future there is no way distill network can detect the browser is bot or not :)
Yes of course it's a cat-mouse game for a long time now.
So the best solution is use ur real browser and make it to do automation etc. Doing so will make them never know u are using real browser or bot. And u can also automate captcha with services like captchas or not.
I'll try it soon.
And u can also automate captcha with services like captchas or not.
I don't really want to pay for solving recaptcha, I know it's very cheap but my code is non-profit currently.
@ahoura so why you think that this plugin can not bypass their antibot system? If I understood correctly your reply, we just need to match all chrome/chromium data (settings, window properties etc) with crafted user-agent. So currently something is inconsistent, for example (hypothetically) we have user-agent of chrome v71 and we can play audio/ogg and by detecting the inconsistence they lead us to recaptcha, right?
Yes thats the general idea, think of it as layers and they can always append to these layers of security. but from what I know and what I have seen in their JS, one of their primary checks is through comparing your browsers features vs the browser you claim to be. the reason I said 'I dont believe this plugin can solve this issue all by itself (I could be wrong tho).' is simply because of the way its been written. if you want to allow randomly generated UA, then you need to have a reference of what each browser is capable of, then inject these said features into the context and REMOVE all the other features to not raise any flags... this is difficult without having access to a lot of traffic to fingerprint the browsers in a similar manner that distilnetworks does, so you can lie to them without being detected. hope that explains it.
And I think this bot detection has become like a cat and mouse game and I think in future there is no way distill network can detect the browser is bot or not :)
I agree with the cat and mouse game, but I think in the future it will be a more advanced version of the same cat and mouse game. for example distil can digest your mouse movement and your behaviour on the page with others on the same page, and check if you have a "regular" behaviour on the page...
Fixed, please report if this issue still persists with the latest stealth version. :)
Seems it still persists, or the cat caught up with the mouse. e.g. I've used latest version and it gets detected after 4 requests.
@clickstefan it can be ip and many stuff right? as it didn't get detected for 3rd request i think u should try with native chrome browser (w/o using bot) and report here?
My use case is simple, I just load the url and get the full html.
I think they don't immediately block as they are allowing some buffer to not block valid requests with slow js or slow internet.
I did load the site with a native browser, even tried using an in-browser crawler extension and all works fine, hence my suspicion of the headless browser/puppeteer fingerprint is being detected somehow.
I can confirm the IP is indeed what is being blocked, as changing it unblocks the requests.
Can read more about their ways here: https://www.imperva.com/products/bot-management/
Imperva collects and analyzes your bot traffic to pinpoint anomalies. Our machine learning models identify real-time bad bot behavior across our network and feed it through our known violators database. Biometric data validation, such as mouse movements, mobile swipe, and accelerometer data, catches malicious botnets. Rate limits based on device fingerprints — not IPs — provide further protection.
try with native browser. If its ip issue then I don't think anybody can fix it unless u change the ip. You can purchase ip easily from external services anyway.
In matter of VPNs I can reccomand luminat.io for residential IPs easy to connect with puppeteer launch method.
@d0peCode luminati is worst. They asked me show my face with credit card lol. It was terrible experience. And luminati provides proxy not vpn.
Uh i heard a lot good and I actually used it without trouble. Maybe stormproxies.com then. Yes it's proxies not VPN.
I find luminati to be very expensive compared to stormproxies.
Thanks for the advice, I don't think it is an IP issue, as same IP works fine in the browser, it's just that it detects the headless browser. If I find something I'll let you guys know. Best regards!
@clickstefan it can be ip issue. When u run headless browser its empty no cookie etc. But when u try in normal browser it can identify cookie etc. So distil network might think this fingerprint looks like freshbrowser and this ip has already used 2-3 times similar fingerprint lets try to show captcha to ensure it etc. :)
@clickstefan It's not about IP usually. DT detected the differences between normal browser and your virtual browser(puppeteer for this case).
I found that DT execute some javascript in our browser and send back to server. It's a long json, there are properties that different from the regular browser. Like my attachment below:
What you can do is before your hit your target page, setup the puppeteer to match the regular browser properties. One of my mistake is the plugins property, previous I set it but wrongly like
const pluginData ='something_not_null;
But instead it has to be like this:
const pluginData = [ { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer', description: 'Portable Document Format' }, { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai', description: '' }, { name: 'Native Client', filename: 'internal-nacl-plugin', description: '' } ];
The follow by Object.setPrototypeOf(formatted_plugin, Plugin.prototype);
Good luck!
I am trying to automate logon to site that is using Distil. When I attempt to logon, I get a captcha.