Ace & Tate (Sitemap, Structured Data, Bot Protection)

alltheplaces / alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Other

636 stars 214 forks source link

Brand name

Ace & Tate

Wikidata ID

Q110516413 https://www.wikidata.org/wiki/Q110516413 https://www.wikidata.org/wiki/Special:EntityData/Q110516413.json

Store finder url(s)

https://www.aceandtate.com/nl-en/stores Official Url(s): https://www.aceandtate.com/

[x] I have run the below and if anything in generated, added it.
[x] I have manually checked for a storefinder
[x] This is not a manufacturer listing outlets operating under their own brands / this will not create duplicates pipenv run scrapy sf --brand-wikidata=Q110516413 https://www.aceandtate.com/

Sample store page url

https://www.aceandtate.com/nl-en/stores/netherlands/amsterdam/van-woustraat-67-h

Countries?

Multiple

Difficulty?

None

Number of POI?

70?

Behaviours

[x] Has coordinates?
[x] Has opening hours?
[ ] JSON API?
[ ] Embedded JSON?
[ ] HTML markup?
[x] Structured Data? (pipenv run scrapy sd (specific page url)) or validator has content
[ ] Open Graph Data?
[x] Sitemap? (pipenv run scrapy sitemap (url))
[x] Requires proxy?
[x] Has 'bot protection' (Cloudflare, etc)?

from scrapy.spiders import SitemapSpider from locations.categories import Categories from locations.structured_data_spider import StructuredDataSpider from locations.settings import DEFAULT_PLAYWRIGHT_SETTINGS from locations.user_agents import BROWSER_DEFAULT class AceAndTateSpider(SitemapSpider, StructuredDataSpider): name = "ace_and_tate" sitemap_urls = ["https://www.aceandtate.com/robots.txt"] # Example: https://www.aceandtate.com/nl-en/stores/netherlands/amsterdam/van-woustraat-67-h sitemap_rules = [(r"/stores/[\w-]+/[\w-]+/[\w-]+$", "parse")] item_attributes = {"brand": "Ace & Tate", "brand_wikidata": "Q110516413"} wanted_types = ["Optician"] user_agent = BROWSER_DEFAULT is_playwright_spider = True

alltheplaces / alltheplaces