Closed itrajanovska closed 1 year ago
So far it was not necessary to render the page or images, because we are only interested in the url. All spiders use the minimal_script which disables all rendering, see: https://github.com/calgo-lab/green-db/blob/302c6ebb27bcd387dbcf37004e4bde28114531d7/scraping/scraping/splash.py#L31-L35 .
In the scrape from 10.02.23 we got a lot of amazon products with the following image urls:
https://m.media-amazon.com/images/W/IMAGERENDERINGjpg
This has happened before as well but now it seems to affected 9 times more products than what happened sometimes in the past.Maybe we should increase the delay for rendering those pages?