EricJMarti / inventory-hunter

⚡️ Get notified as soon as your next CPU, GPU, or game console is in stock
MIT License
1.12k stars 263 forks source link

Amazon scraper inconsistent/not reliable #68

Open utahbmxer opened 3 years ago

utahbmxer commented 3 years ago

I haven't checked the other scrapers as close, but the Amazon one seems like it has issues (aside from #51), which could potentially miss an item or be delayed. Here is what I am seeing. I put a single item (RTX 2080 that is available) into a config. I started the container and watch the logs, there are many checks that log as not in stock, then it will finally alert as "in stock" much later that I would expect. The subsequent checks reports "not in stock" and it follows this way for several more checks before randomly alerting "in stock" again. For some reason I really suck at python, I really want to help out but I can't follow the code enough to see what's happening.

I2020-12-05 05:13:49,399 scraper initialized for https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
W2020-12-05 05:13:51,402 warning: using selenium webdriver for scraping... this feature is under active development
W2020-12-05 05:13:54,273 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:13:54,277 B08CLV8CKP: not in stock
W2020-12-05 05:13:58,602 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:13:58,605 B08CLV8CKP: not in stock
W2020-12-05 05:14:03,011 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:03,015 B08CLV8CKP: not in stock
W2020-12-05 05:14:07,671 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:07,674 B08CLV8CKP: not in stock
I2020-12-05 05:14:16,376 B08CLV8CKP: now in stock at 939.99!
W2020-12-05 05:14:20,897 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:20,902 B08CLV8CKP: not in stock
W2020-12-05 05:14:25,415 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:25,417 B08CLV8CKP: not in stock
W2020-12-05 05:14:30,144 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:30,146 B08CLV8CKP: not in stock
W2020-12-05 05:14:34,454 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:34,457 B08CLV8CKP: not in stock
W2020-12-05 05:14:38,961 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:38,963 B08CLV8CKP: not in stock
W2020-12-05 05:14:43,947 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:43,950 B08CLV8CKP: not in stock
W2020-12-05 05:14:48,691 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:48,694 B08CLV8CKP: not in stock
W2020-12-05 05:14:53,083 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:53,086 B08CLV8CKP: not in stock
W2020-12-05 05:14:57,544 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:14:57,546 B08CLV8CKP: not in stock
W2020-12-05 05:15:02,491 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:02,493 B08CLV8CKP: not in stock
W2020-12-05 05:15:07,265 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:07,268 B08CLV8CKP: not in stock
W2020-12-05 05:15:11,628 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:11,631 B08CLV8CKP: not in stock
W2020-12-05 05:15:16,007 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:16,012 B08CLV8CKP: not in stock
W2020-12-05 05:15:20,535 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:20,537 B08CLV8CKP: not in stock
W2020-12-05 05:15:25,497 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:25,503 B08CLV8CKP: not in stock
W2020-12-05 05:15:30,303 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:30,306 B08CLV8CKP: not in stock
I2020-12-05 05:15:38,673 B08CLV8CKP: now in stock at 939.99!
W2020-12-05 05:15:44,671 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:44,673 B08CLV8CKP: not in stock
W2020-12-05 05:15:49,413 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:49,415 B08CLV8CKP: not in stock
W2020-12-05 05:15:54,225 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:54,230 B08CLV8CKP: not in stock
W2020-12-05 05:15:58,513 missing title: https://www.amazon.com/MSI-GeForce-Architecture-Overclocked-Graphics/dp/B08CLV8CKP
I2020-12-05 05:15:58,515 B08CLV8CKP: not in stock
Pipodi commented 3 years ago

Getting the same problem with Amazon.it. I suppose it's something related to Chromium driver. I suck at Python too, so I can only guess what is going on here. I don't know, can it cache some previous result and give false results?

EricJMarti commented 3 years ago

@Pipodi The chromedriver zombie bug was just fixed.

Please pull the latest image using:

$ docker pull ericjmarti/inventory-hunter:latest

The "missing title" errors you are seeing are caused by Amazon; they detected that you are running a web scraper and have revoked access. I am investigating workarounds now.

EricJMarti commented 3 years ago

I am getting much better results on Amazon after this commit: https://github.com/EricJMarti/inventory-hunter/commit/e076181ca702b08c5271186197c51742d56f952e

Please update to the latest image and give it another try.

Pipodi commented 3 years ago

@EricJMarti Yeah, now it works! Thanks. If you have time, we could brainstorm and address https://github.com/EricJMarti/inventory-hunter/issues/54 in some more efficient way.

EricJMarti commented 3 years ago

@Pipodi Yeah definitely. Right now, I'm working on standing up a unit testing framework so that we can add proper internationalization without breaking anything.