Closed CharlotteJackson closed 3 years ago
This google sheet has information that can attempt to scrape https://docs.google.com/spreadsheets/d/1uCDD5ox8t4WDNB377PomsS3PS_QInaoE4_lGmqtxb-M/edit#gid=0
Once data is collected need to figure out where to ingest into db
Script can be run, data uploaded to s3 as well
Unfortunately I can't open this google sheet in tableau. Error message: The Google Sheets service reported an unrecognized error when processing this request. This file is too large to be exported.
Here's the URL to the sheet that I can successfully open. Is there any way to do a "diff" of the two to see what the issue is? Perhaps the one you're exporting programmatically has extra rows or columns in it that mine doesn't have when exported from Postgres via pgAdmin csv export.
sorry, i'm an idiot. submitted feedback on the wrong issue.
hi. i'm noticing this data in the Source Data schema hasn't been updated since 4/27/2021. Is this data load not automated yet? Or did something change to disrupt the data flow?
hi. i'm noticing this data in the Source Data schema hasn't been updated since 4/27/2021. Is this data load not automated yet? Or did something change to disrupt the data flow?
what data source are you referring to? Can you provide a URL?
sorry, this is for the Families for Safe Streets data that comes from a google sheet and gets loaded into source_data.dc_fss table
source of data is here: https://docs.google.com/spreadsheets/d/1uCDD5ox8t4WDNB377PomsS3PS_QInaoE4_lGmqtxb-M/edit#gid=0
The data was being uploaded to a private bucket, just added a script to upload the data to a public bucket as well
https://dc-crash-bot-public.s3.amazonaws.com/source-data/dc-fss/dc_fss.csv
i need to throw a bash script/cron job on the EC2 instance that pulls the csv into the database every night! Doing that now. Also Banjo about to shoot you the certificate and the URL to it.
On Mon, Jul 26, 2021 at 9:41 PM Banjo Obayomi @.***> wrote:
The data was being uploaded to a private bucket, just added a script to upload the data to a public bucket as well
https://dc-crash-bot-public.s3.amazonaws.com/source-data/dc-fss/dc_fss.csv
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/CharlotteJackson/DC_Crash_Bot/issues/65#issuecomment-887143122, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSPWEVYOGHDTKBBVFHCYBTTZYFD3ANCNFSM427XNEWA .
What is the Task
Set up a pipeline to a Google sheet with up-to-date fatalities information
Why do we want to do this
The official crashes dataset does not accurately reflect fatality counts
How can I get started?
Identify the google sheet to use, or create one ourselves
Definition of Done
When there's an automated pipeline into the crashes database