Scrapping esport betting information from web site www.pinacle.com using Scrapy and Selenium.
Take note: script was created for educational purposes to demonstrate usage of scrapy Pipelines, LinkExtractors, "Rules", Generic Spiders, Items, xpath selectors.
So, what does this spider exactly doing(general algorithm):
After all processes finished we will get information about each single esport event to come. But, we will not include events, that already passed(or in progress), and betting data for not primary events(such as betting on "first blood", "second map winner" etc). Also, event/game time will be converted to UTC format. (If you want include all events and keep original "site time" - comment code inside "pipelines.py" file or exclude pipelines in "setting.py").
Keys and description for each returning line of information:
This script was written in Python 3.6(for scrapy 1.5) and tested on Windows machine. Before running it, you'll need to install:
After installing all requirements - copy "Pinnacle" folder to your machine/device. Open "pipelines.py" file and set variable "TIME_DIFFERENCE" to your own value (if needed).
To run a spider - change your location in terminal to scrapy project folder and type:
scrapy crawl pinnacle
To save data to .json file(for example), type:
scrapy crawl pinnacle -o yourfile.json