CivicTechTO / ttc_subway_times

A scraper to grab and publish TTC subway arrival times.
GNU General Public License v3.0
40 stars 30 forks source link

Bug trying to consolidate October Data #72

Open radumas opened 4 years ago

radumas commented 4 years ago
Extracting 2019-10-29.tar.gz
 98%|███████████████████████████████████▏| 40000/40983 [06:34<00:09, 100.08it/s]Traceback (most recent call last):
  File "fetch_s3.py", line 212, in <module>
    fetch_s3()
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "fetch_s3.py", line 208, in fetch_s3
    _fetch_s3(aws_access_key_id, aws_secret_access_key, output_dir, start_date, end_date, bucket)
  File "fetch_s3.py", line 197, in _fetch_s3
    fetch_and_transform(to_download, output_dir)
  File "fetch_s3.py", line 80, in fetch_and_transform
    jsons_to_csv(tmpdir, output_dir)
  File "fetch_s3.py", line 117, in jsons_to_csv
    pd.DataFrame.from_records(requests, columns=requests[0]._fields).to_csv(
IndexError: list index out of range
radumas commented 4 years ago

So I think the issue is because the API wasn't working at all Oct 5th, and so there was no requests data within a chunk of 2000 files (aka minutes?). I put a simple if statement to check if the requests variable is empty and consolidation worked.... will upload the fix soon.

tloureiro commented 4 years ago

@radumas did you end up fixing this? I see the October consolidated file at https://spideroak.com/browse/share/raphaeld/ttc_subway_times/ttc_subway_times/serverless_data/ but I don't see the fix in the fetch_s3.py

radumas commented 4 years ago

@tloureiro thanks for catching this! I think I forgot to push code fixing this. I'll try to do so tonight...

radumas commented 2 years ago

Whoops I didn't push a fix and this is still an issue, more recently with 2021-06 data

Traceback (most recent call last):
  File "fetch_s3.py", line 210, in <module>
    fetch_s3()
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "fetch_s3.py", line 206, in fetch_s3
    fetch_and_transform(to_download, output_dir)
  File "fetch_s3.py", line 80, in fetch_and_transform
    jsons_to_csv(tmpdir, output_dir)
  File "fetch_s3.py", line 117, in jsons_to_csv
    pd.DataFrame.from_records(requests, columns=requests[0]._fields).to_csv(
IndexError: list index out of range