VasssilPopov / AllSpiders

Some spiders
0 stars 3 forks source link

About data structure

The main folder is AllSpiders.

It contains the following subfolders:

Blitz - information about specific spider. The same set will be repeated for each spider

BlitzSpider.py - spider program Cleaning.py - specific to Blitz data verification program RunIt.bat - It run only current spider (BlitzSpider at the moment) !!! Current date must be entered as a parameter Date format must be: YYYY-mm-dd (2017-05-06) It will serve to be include into report's name for the current running spider (Blitz-2017-05-01.json)

Logs - subdirectory. output.txt - contains printout of latest BlitzSpider run .

Reports - subdirectory. Blitz-2017-05-01.json - spider result Blitz-2017-05-02.json - spider result Blitz-2017-05-03.json - spider result Blitz-2017-05-04.json - spider result

How to get sources:

  1. Create a directory where all stuff will stay. md Insects
  2. Go in cd Insects
  3. Create the local repository git init
  4. Get all sources into c:/Insects git clone https://github.com/VasssilPopov/AllSpiders.git