berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.26k stars 733 forks source link

stealth mode being detected by Amazon login in headless mode #67

Closed moltar closed 4 years ago

moltar commented 4 years ago

Not sure what and how they do it, but it does not work. Amazon throws up a captcha.

This is very consistent.

Works fine with head mode. Get a captcha every time in headless mode.

Any tips for debugging this?

Thanks.

shravan2x commented 4 years ago

Could you provide a repro? Which Amazon login page is this?

moltar commented 4 years ago

I think I have a fix, please email me at pupextra67@fddf.net

berstend commented 4 years ago

@moltar would you mind sharing your findings? :)

heathera2016 commented 4 years ago

Amazon allows it to pass once, and then it blocks always whether it is head mode or headless mode. How can I use stealth mode on Amazon without stuck?

berstend commented 4 years ago

@heathera2016 without further info it's hard to debug :) Did you try running the headless tests mentioned in the Readme of the stealth plugin?

Maybe there's a new way to detect headless that we need to implement a fix for.

Also: Depending on the situation you could consider pointing puppeteer to a full Google Chrome instead of using the built-in Chromium.

heathera2016 commented 4 years ago

@berstend First of all, thanks for your great plugin. I've investigated and these are what I found so far.

  1. Amazon blocks and asks to solve captcha in head mode when my visit was twice with puppeteer.
  2. After I got stuck this, stealth mode is useless before solving this captcha once.
  3. In head mode, if I set "userDataDir: './tmp'" in options, Amazon does not block and bypass their robot check.
  4. Without that option "userDataDir", as you mentioned, I tested in headless mode, Amazon does not block if there's no stuck before.

    <1> I have a question. May I know the reason stealth mode is working in headless mode only? <2> Do you know the reason Amazon's block is determined by existence of userDataDir? I thought it was related to cookies at first, but it seems not. I'm not sure, so I'm doing research. Many thanks again!
berstend commented 4 years ago

I have a question. May I know the reason stealth mode is working in headless mode only?

Mhm? The stealth plugin should work in both headless and headful mode. It cannot guarantee 100% success though, the whole detection evasion thing is a cat and mouse game :)

Do you know the reason Amazon's block is determined by existence of userDataDir

Cookies are the first thought that comes to mind, but nowadays there's a bunch of different things stored as session identifier as well (localStorage, etc). You could inspect the Application tab to see what the site in question is storing. :)

heathera2016 commented 4 years ago

@berstend Thanks for your kind reply. Have a good one! :D 👍

berstend commented 4 years ago

@heathera2016 no worries :) Closing this for now, but let me know if you think we can improve the stealth plugin somewhere :)