Open xWildxChildx opened 3 years ago
`--- refresh_interval: 60 # seconds urls:
As an update, the error I'm getting is as follows: caught exception during request: Message: session not created from timeout: Timed out receiving message from renderer: 600.000 (Session info: headless chrome=87.0.4280.88)
This is sometimes after one successful scrape? Unclear what the issue is.
I am also having this same problem and getting the same errors. Only the New Egg scrapers are working for me currently.
As a further update, I decided to wipe everything and start over, thinking I had done something wrong. I then followed from beginning to end as I had originally had, using the premade configs to remove the variability of me making a mistake in the config. Still running into the same issue. I'm currently using bhphotovideo links, as well as newegg. Those are fine. Best buy links are doing nothing. In some cases the first link will work normally, then it will cease operation. I've tried longer refreshes although that seems to be unrelated.
Another update, I learned the pkill x command (as I said, I'm new lol) and that seems to have resolved the error message, however the issue remains with BestBuy links doing nothing.
Best Buy is blocking scrapers that are using headless browsers - meaning the one being deployed by this scraper. Same with Amazon.
When did they start doing that?
On Tue, Jan 12, 2021 at 9:47 AM lonicade notifications@github.com wrote:
Best Buy is blocking scrapers that are using headless browsers - meaning the one being deployed by this scraper. Same with Amazon.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EricJMarti/inventory-hunter/issues/139#issuecomment-758827714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6RVZY3TCORE3AKJIDOMPDSZSDLNANCNFSM4VYJGQOA .
-- The cosmos is all that is, or ever was, or ever will be -Carl Sagan
Yea, all of my containers are giving me the same error and preventing from any scrapping to happen.
I'm not sure when - but there are threads on reddit discussing the issue dating back two years. Best Buy really tries to force you to go in like a human and use the customer facing page. Maybe it's something they turn on and off when they have issues with stock (PS5/XBOX and GPUs, for instance)?? No clue. I found similar threads for Amazon.
It may be a primitive form of ratelimiting on valuable items so that they have valid usage data. I can see this being a problem for scalpers and people like us.
In missing around with the config.yaml files it seems these errors have to do with website links themselves. I deleted all of the config.yaml files, created new ones with new links and its working. I created new .yaml files per website instead of per product.
Also I wanted to reference the Headless Chrome issues, in referencing karma-runner /karma-chrome-launcher seems Headless Chromium with Puppeteer could be a fix?
In missing around with the config.yaml files it seems these errors have to do with website links themselves. I deleted all of the config.yaml files, created new ones with new links and its working. I created new .yaml files per website instead of per product.
Can you share one of the best buy links you used that ended up working? When I set it up a few days ago, I did it the way you describe, but they still fail.
*doesn't have to be for one of the products you're interested it... any link form Best Buy that would work is fine.
** Have you checked the logs to verify it's actually scraping?
This wasn't an issue probably 2 weeks ago. Now it is. Happening with any links to best buy.
I am also now encountering the same issue with BestBuy. Definitely seems to be something they recently changed on their end.
It looks like newegg.ca ( with some restricted refresh times) / memoryexress and amazon.ca are being scraped just fine . But I'm not sure about bestbuy, here the log I'm getting :
2021-01-14 11:57:12,418 [bstby_c_1] missing title: https://www.bestbuy.ca/en-ca/product/asus-rog-strix-nvidia-geforce-rtx-3080-10gb-gddr6x-video-card/14954116 W2021-01-14 11:57:12,433 [bstby_c_1] missing price: https://www.bestbuy.ca/en-ca/product/asus-rog-strix-nvidia-geforce-rtx-3080-10gb-gddr6x-video-card/14954116 I2021-01-14 11:57:12,441 [bstby_c_1] not in stock
It looks like it's working , but I've added few "test" links It doesn't detect those products at all.
Using this link: https://www.bestbuy.com/site/amd-ryzen-9-5900x-4th-gen-12-core-24-threads-unlocked-desktop-processor-without-cooler/6438942.p?skuId=6438942 With a refresh interval: 15 seconds Still getting error: E2021-01-14 22:45:55,186 [bstby_1] scrape failed E2021-01-14 22:56:14,992 [bstby_1] caught exception during request: Message: session not created from timeout: Timed out receiving message from renderer: 600.000 (Session info: headless chrome=87.0.4280.88)
Still occurs even with a 30 second refresh rate. Not using Amazon, but bhphotovideo, and newegg are working fine at much lower refresh rates.
Same issue as OP with screenshot from my logs. It appears to be the same with the other sites except for NewEgg now.
It looks like newegg.ca ( with some restricted refresh times) / memoryexress and amazon.ca are being scraped just fine . But I'm not sure about bestbuy, here the log I'm getting :
2021-01-14 11:57:12,418 [bstby_c_1] missing title: https://www.bestbuy.ca/en-ca/product/asus-rog-strix-nvidia-geforce-rtx-3080-10gb-gddr6x-video-card/14954116 W2021-01-14 11:57:12,433 [bstby_c_1] missing price: https://www.bestbuy.ca/en-ca/product/asus-rog-strix-nvidia-geforce-rtx-3080-10gb-gddr6x-video-card/14954116 I2021-01-14 11:57:12,441 [bstby_c_1] not in stock
It looks like it's working , but I've added few "test" links It doesn't detect those products at all.
Interesting - those are for Best Buy Canada, I see. I still have the same scrape failure (headless chrome) messages for the US site. The logs show there are some periods of time where it works, still, so it's confusing why that would be... Even when it works, the logs update very slowly.. maybe due to Best Buy taking a long time to respond to the request?
Who knows... I finally knabbed a card for my gaming rig from B&H - that scraper worked really well.
Encountering the following error upon creating a container for Best Buy links:
_XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed _XSERVTransMakeAllCOTSServerListeners: server already running (EE) Fatal server error: (EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE)
This previously would occur and not really seem to matter as the container would continue operation. However now it initializes all scrapers, and proceeds to do nothing at all. Even if it is the only container, it sits there and does nothing after initializing scrapers. Any help is much appreciated.