Open CharlotteJackson opened 3 years ago
@eichler4dc where is the google sheet?
it might be easier for you to simply create your own google sheet using this query, and I can update my viz to point to it. I'd hate to break what is up and running now.
`select reportdate, bicycle_fatalities as fatal_bicyclist, vehicle_fatalities as fatal_driver, pedestrian_fatalities as fatal_pedestrian, total_bicyclists as total_bicycles, total_pedestrians, drivers_impaired as driversimipared, drivers_speeding as speeding_involved, blockkey, 0 fatalpassenger, left(smd_id, 2) anc_id, smd_id , st_X(geography) x_coord, st_Y(geography) y_coord, intersectionid from analysis_data.dc_crashes_w_details where reportdate between date '2015-01-01 00:00:01' and current_date
order by reportdate desc`
Was able to create a script that
hmm google drive is converting smd_id's with E i.e "5E07" to a scientific notation number "5.00E+7" to combat that I added quotes around the number so it shows up as "5E07" in the spreadsheet, would that break anything in the vis @eichler4dc
Let me try this out and let you know!
Michael Eichler Advisory Neighborhood Commissioner Single Member District 6E01 (202) 599-6326
@.*** | http://eichler4dc.com | @eichler4dc http://twitter.com/eichler4dc | Neighbor Survey http://tinyurl.com/6e01-survey
On Sat, Apr 17, 2021 at 6:10 PM Banjo Obayomi @.***> wrote:
hmm google drive is converting smd_id's with E i.e "5E07" to a scientific notation number "5.00E+7" to combat that I added quotes around the number so it shows up as "5E07" in the spreadsheet, would that break anything in the vis @eichler4dc https://github.com/eichler4dc
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CharlotteJackson/DC_Crash_Bot/issues/61#issuecomment-821894590, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATJORJWKMJDJR5JJGAFPXYDTJIBL5ANCNFSM42UC4DEQ .
I was trying this one and received the error I mentioned. I just added my comment to the wrong issue. Same error message: The Google Sheets service reported an unrecognized error when processing this request.
This file is too large to be exported.
As I noted in the other issue, this sheet works fine in Tableau. Was created by uploading CSV. Have to turn off the option to automatically determine field datatypes to fix the scientific notation problem. https://docs.google.com/spreadsheets/d/1IAEwjnRUvcGfhbzk_2_5Uso2SdhLbhht_zSdyXXBS0I/edit#gid=688603079
I tried manually downloading (exporting) the "dc_crash_bot" data from your gsheet and it worked fine. Not sure what the error might be.
My guess is because of the extra data (columns and rows) in the dataset, the CSV is larger than the allowed size which is somewhere between 10-12MB according to https://kb.tableau.com/articles/issue/error-this-file-is-too-large-to-be-exported-when-extracting-data-from-google-sheet
Cause
The ability to connect to Google Sheets greater than 10 MB is currently not built into the product.
Here are the current sizes of the csv files
19M gdrive_auto_upload.csv <-- New sheet
12M crash_data_for_gsheet_20210403.csv <-- Old sheet
Solutions would be to either
Let me experiment a bit. I was thinking that too but something else might be up.
Good call on exceeding the file size. I must have been right up to the limit and only now exceeding it. Let's tryi updating the query to this one that groups by blockkey, the date rounded to the nearest month, and the single-member district ID. It brings the total dataset down a bit, not a lot. Should be enough to buy us some time. Thanks!!!
select date_trunc('month', reportdate) as reportdate, sum(bicycle_fatalities) as fatal_bicyclist, sum(vehicle_fatalities) as fatal_driver, sum(pedestrian_fatalities) as fatal_pedestrian, sum(total_bicyclists) as total_bicycles, sum(total_pedestrians) as total_pedestrians, blockkey, smd_id, count(*) crash_count , avg(st_X(geography)) x_coord, avg(st_Y(geography)) y_coord from analysis_data.dc_crashes_w_details where reportdate between date '2015-01-01 00:00:01' and current_date group by date_trunc('month', reportdate), blockkey, smd_id order by reportdate desc
With that query, the sheet is still too big
15M gdrive_auto_upload.csv
6 years of data too much will need to lower the time range where reportdate between date '2015-01-01 00:00:01'
Using 2019 as the start date was able to get a 5MB csv
4.9M gdrive_auto_upload.csv
I have made a WDC http://35.170.77.227:8888/dc_crash_bot.html
to use the data from the query, since google drive wont work
Neat. Okay, i was able to load this into tableau without issue. was going to upload a test to the server and see if it auto-refreshes like we want. Tableau froze on me. Might be incompatibilities of my new mac (apple chip) and Tableau not being native to that processor yet. I'm thinking if we don't have the file size constraints of using google sheets we could flip back to the previous version of the query.
select --*, reportdate, bicycle_fatalities as fatal_bicyclist, vehicle_fatalities as fatal_driver, pedestrian_fatalities as fatal_pedestrian, total_bicyclists as total_bicycles, total_pedestrians, drivers_impaired as driversimipared, drivers_speeding as speeding_involved, blockkey, 0 fatalpassenger, left(smd_id, 2) anc_id, smd_id , st_X(geography) x_coord, st_Y(geography) y_coord, intersectionid from analysis_data.dc_crashes_w_details --where reportdate between date '2021-04-15 03:35:31' and current_date where reportdate between date '2015-01-01 00:00:01' and current_date order by reportdate desc
Updated
Oh, great! This is mostly working. Two issues:
1) When I can see the dates property, there are some outside my specified date range. Are you keeping the date where clause I passed along? Between 1/1/2015 and system date?
2) Can you verify that ReportDate is being formatted as such: 2017-04-18 22:20:52+00 When I pull in this data directly, my existing viz converts all the dates to 1899-12-30 00:00:00 for some reason. I think it's because I had the dates formatted one way in the past and perhaps you are passing it along in a different format? Update: Oh, I think your dates have an extra ":00" at the end. Can you trim that off?
Thanks!!!
I just did a push, to format reportdate for the viz
Thanks for this. i tried dropping this new data into the viz and encountered a set of errors. I might need to recreate the viz at this point. it would be easier and faster than continuing to debug why this new dataset from the web data connector is incompatible with my old one. should only take me an hour or two.
Before I decide to rebuild entirely, could we try adding the report date back in as another field, this time in text format? Here's the updated query:
select --*, reportdate, bicycle_fatalities as fatal_bicyclist, vehicle_fatalities as fatal_driver, pedestrian_fatalities as fatal_pedestrian, total_bicyclists as total_bicycles, total_pedestrians, drivers_impaired as driversimipared, drivers_speeding as speeding_involved, blockkey, 0 fatalpassenger, left(smd_id, 2) anc_id, smd_id , st_X(geography) x_coord, st_Y(geography) y_coord, intersectionid , to_char(reportdate, 'YYYYMMDD HH24:MI:SS') report_date_string from analysis_data.dc_crashes_w_details where reportdate between date '2015-01-01 00:00:01' and current_date order by reportdate desc
Thanks!
updated
Ok @eichler4dc I think i may have fixed the issue
Found all supported tableau data formats here https://tableau.github.io/webdataconnector/docs/wdc_ref_date_formats.html the +00 was invalid and was treating everything as a string. I updated the script to use the correct format and looks like the connector accepts the datetime format now correctly.
this is great news. I was driving myself crazy trying to figure this out.
Before seeing this update, I pulled the data from the WDC into a new workbook and am finding it's only partial data. See below. I just queried the underlying data via pgAdmin and it matches. The underlying data is missing! @CharlotteJackson was there a change to the data loader? Or is it missing from the DC Open Data portal as well?
Perhaps I show folks how easy it is to validate whether the WDC is working as expected so you can debug it yourself instead of waiting for me to try it out in my Viz?
ZOMG it works. @banjtheman , thank you!!! I have to figure out one more thing before I publish it in place of my old dashboard, but here's the sneak peek: https://public.tableau.com/profile/michael.eichler#!/vizhome/DCCrashesbyBlock2010to2021-live/CrashesbyANCBlock
Awesome! Are we good to close this issue?
What is the Task
write a script that refreshes a Google sheet at a regular interval with the latest data from the crashes table on our PostGIS db
Why do we want to do this
so Michael's Tableau visualization will automatically update
How can I get started?
check stack overflow for examples :)
Definition of Done
when no manual intervention is needed to refresh the Tableau visualization with the latest data