CivicTechTO / ttc_subway_times

A scraper to grab and publish TTC subway arrival times.
GNU General Public License v3.0
40 stars 30 forks source link

Put the Serverless data onto Spideroak #64

Closed radumas closed 4 years ago

radumas commented 5 years ago

I have data since March 19, 2019 from running the AWS scraper

radumas commented 4 years ago

Quick python script to pull data from each month

import os
from fetch_s3 import _fetch_s3
from datetime import datetime
from dateutil.relativedelta import relativedelta

for month in ['2019-04-01',
              '2019-05-01',
              '2019-06-01',
              '2019-07-01',
              '2019-08-01',
              '2019-09-01']:
    print('Making directory {}/'.format(month[:7]))
    try:
        os.mkdir(month[:7])
    except FileExistsError:
        print('Directory already exists')
    start_date = datetime.strptime(month, '%Y-%m-%d')
    end_date = start_date + relativedelta(months=1, days=-1)
    print('Fetching data for {}'.format(month))
    _fetch_s3(None, None, month[:7], start_date.strftime('%Y-%m-%d'),
              end_date.strftime('%Y-%m-%d'), 'ttcscraper'