Austerius / Pinnacle-Scraper

Scrapping esport betting information from Pinacle.com using Scrapy and Selenium
BSD 3-Clause "New" or "Revised" License
15 stars 6 forks source link

Pinnacle-Scraper

Scrapping esport betting information from web site www.pinacle.com using Scrapy and Selenium.

Take note: script was created for educational purposes to demonstrate usage of scrapy Pipelines, LinkExtractors, "Rules", Generic Spiders, Items, xpath selectors.

So, what does this spider exactly doing(general algorithm):

  1. Gather links to betting pages for each esport event(using appropriate set of rules).
  2. Follow each extracted link and scrape esport data.
  3. Filter gathered data in the pipeline.

After all processes finished we will get information about each single esport event to come. But, we will not include events, that already passed(or in progress), and betting data for not primary events(such as betting on "first blood", "second map winner" etc). Also, event/game time will be converted to UTC format. (If you want include all events and keep original "site time" - comment code inside "pipelines.py" file or exclude pipelines in "setting.py").

Keys and description for each returning line of information:

This script was written in Python 3.6(for scrapy 1.5) and tested on Windows machine. Before running it, you'll need to install:

After installing all requirements - copy "Pinnacle" folder to your machine/device. Open "pipelines.py" file and set variable "TIME_DIFFERENCE" to your own value (if needed).

To run a spider - change your location in terminal to scrapy project folder and type:
scrapy crawl pinnacle
To save data to .json file(for example), type:
scrapy crawl pinnacle -o yourfile.json