Open hunterowens opened 4 years ago
cc @tiffanychu90 @trantom
Specifically, this notebook is one where City Planning would probably want as a PowerBI report.
Cell 4 is where we're at a bit of an impasse:
Took a look at python/R integration into powerbi today. In theory you can load a pandas/dplyr dataframe and hand it off to a dashboard element. In practice, it really doesn't seem like it works well at all. It does a really bad job of finding a local environment, has ugly shims that break scripts, and I was more-or-less unable to get anything to work.
There are connectors for web URLs and Redshift -- those may ultimately be better options
I would say using a default connector would significantly reduce the overhead by PMU to access this data and others who use PowerBI. The other default connectors can also support some of the DBMS that Planning has as well I think it is a matter of getting a current environment description and agreement on who may be supporting it after the transfer through working with David Terukina and Mony's teams on the systems side. Our GIS side has data on SQL Server and the spatial data for the web gis on an ArcGIS data store.
@ian-r-rose The Python integration for me seems more for ad-hoc reports and less for live or dynamically updated ones. I think Python visualizations should be kept in their native environment since trying to work in Python with PowerBI is additional setup overhead we want to abstract away. Also, there are data input limits and no cross-filtering which is a key feature of PowerBI reports. Microsoft spells out the limitations here.
Pipeline test from Civis -> Google Sheets -> PowerBi worked
Pipeline test from Civis -> Google Sheets -> PowerBi worked
Very nice
I was also able to successfully run the civis-df-to-drive-test.py
locally instead of on platform to the Scratch schema on the postgres cluster and then on Civis export and overwrite a Google sheet. However the main caveat is the limit 5,000,000 cells.
100,000 records x 22 columns was okay 200,000 records x 22 columns gives an error even though it's not 5,000,000 cells yet.
Export output log:
Failed to sync some datasets. The following errors were encountered: Too many failures uploading to Google Drive, please rerun or check job settings, the most recent failure error was: Error appending to worksheet named: Sheet1: badRequest: This action would increase the number of cells in the workbook above the limit of 5000000 cells.
Currently, the ITA team and Planning PMU team have been working on getting a project done regarding planning entitlements.
This project consists of 3-4 source datasets, some static (Census, etc) and a couple dynamic (PCTS - Planning Case Tracking System).
We have a number of cleaning scripts (see
src
) and a custom package for subsetting this data (seelaplan
). We'd like to expose this to end users in the planning performance management section (Power BI, Excel, GDrive) and the GIS unit (Arc) to start.Here are the options I see, but curious if Civis has worked with this before.
Ref