alltheplaces / alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.
https://www.alltheplaces.xyz
Other
610 stars 207 forks source link

Storefinder: "_next/data/(key)/*.json" #7762

Open CloCkWeRX opened 6 months ago

CloCkWeRX commented 6 months ago

https://curaleaf.com/_next/data/sjXHfcM099K6PZVZVtK3H/locations.json 7 results - 7 files

locations/spiders/burger_king_cz.py: 21 yield JsonRequest( 22: url=f"https://burgerking.cz/_next/data/{next_build_id}/restaurants.json", callback=self.parse_locations 23 )

locations/spiders/burger_king_pl.py: 15 next_build_id = response.xpath("//script[contains(@src, '_ssgManifest.js')]/@src").get().split("/")[3] 16: url = f"https://burgerking.pl/_next/data/{next_build_id}/restaurants.json" 17 yield JsonRequest(url=url, callback=self.parse_api)

locations/spiders/crumbl_cookies_us.py: 17 next_build_id = response.xpath("//script[contains(@src, '_ssgManifest.js')]/@src").get().split("/")[3] 18: url = f"https://crumblcookies.com/_next/data/{next_build_id}/en-US/stores.json" 19 yield JsonRequest(url=url, callback=self.parse_api)

locations/spiders/delikatesy_centrum_pl.py: 19 next_build_id = response.xpath("//script[contains(@src, '_ssgManifest.js')]/@src").get().split("/")[3] 20: url = f"https://www.delikatesy.pl/_next/data/{next_build_id}/sklepy.json" 21 yield JsonRequest(url=url, callback=self.parse_api)

locations/spiders/quick_be_lu.py: 24 ) 25: yield JsonRequest(f"https://www.quick.be/_next/data/{build_id}/fr/restaurants.json") 26

locations/spiders/teknikmagasinet.py: 8 start_urls = [ 9: "https://www.teknikmagasinet.se/_next/data/l27oTv8kIMOzrHw2WFLQi/sv/teknikmagasinet/find-your-store.json" 10 ]

locations/spiders/tommy_hr.py: 11 item_attributes = {"brand": "Tommy", "brand_wikidata": "Q12643718"} 12: start_urls = ["https://www.tommy.hr/_next/data/NQBnI1_5yBtg95innap3m/hr-HR/prodavaonice.json"] 13

davidhicks commented 5 months ago

I like the idea of a new storefinder here which automatically detects the Next.js build identifier and then proceeds to download a static JSON file specified as a parameter to the storefinder. I don't think the storefinder can/should attempt to parse the JSON file, instead leaving it up to the individual spider. This is because every brand is free to format the JSON file however they like and there is no consistency outside of some other storefinder that may exist specifically for integration with Next.js.

Cj-Malone commented 5 months ago

Yeah it's not a literal store finder, but I agree a next helper could be useful.