EricJMarti / inventory-hunter

⚡️ Get notified as soon as your next CPU, GPU, or game console is in stock
MIT License
1.12k stars 263 forks source link

Adorama failing to connect with a 403 error #106

Closed TK-SpartanGolf6 closed 3 years ago

TK-SpartanGolf6 commented 3 years ago

I'm scraping eight Adorama RTX 3080's and they're all failing to scrape with the same error below:

Caught except during request: got response with status code 403 for (Adorama link)

I believe Adorama has great scalping/botting protection as when I try to manually go to Adorama, it asks me to do a ReCaptcha. I am suspecting that Adorama is a lost cause, but does anyone know how to get around this or if it will be patched in later updates to scrape Adorama?

MRizkBV commented 3 years ago

I think the script would need to support rotating user agents to avoid captcha? Not something I can do myself but hopefully someone else can.

EricJMarti commented 3 years ago

Challenge accepted!

Adorama does have some serious protections in place, so hopefully these changes will help with that: https://github.com/EricJMarti/inventory-hunter/commit/a7ccd4a451433a7c87ca6eaf6ec75e142c83495b

I also added first-class support for Adorama: https://github.com/EricJMarti/inventory-hunter/commit/6504842f31adf5ee47239d07c301c5678cf82c30

These changes are building and should be available on Docker Hub in about 20 minutes.

@MRizkBV I think rotating user agents would cause more harm since some detection mechanisms validate if the user agent makes sense. See: https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth--7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html

CAPTCHA is a bit more difficult but not impossible to defeat. That said, for Adorama, I have yet to encounter a CAPTCHA using this bot. However, I did encounter one when I searched for "rtx" in Chrome on my Mac.

Edit: Available now on Docker Hub!