CityOfLosAngeles / aqueduct

A shared pipeline for building ETLs and batch jobs that we run at the City of LA for Data Science Projects. Built on Apache Airflow & Civis Platform
Apache License 2.0
21 stars 6 forks source link

Add notebook for creating public catalog for Civis data warehouses on github pages #328

Closed ian-r-rose closed 4 years ago

ian-r-rose commented 4 years ago

This creates an ipywidgets-based static HTML file out of the current list of tables in the Civis data warehouse, and uploads it to GitHub pages: https://cityoflosangeles.github.io/aqueduct/catalog.html

Nothing fancy, but perhaps useful for sharing around.

Fixes #320

hunterowens commented 4 years ago

oooh, this looks good.

one thing I think we need to resolve is making sure the robot user can see all schemas / tables

ian-r-rose commented 4 years ago

Yeah, good point. We should make sure to discuss permissions issues (cf. #325) tomorrow

hunterowens commented 4 years ago

seems like the best way to do this is voila service it as the robot user, and make sure the robot user can read every / schema table?

ian-r-rose commented 4 years ago

Yeah. I wasn't fully satisfied with the user experience there due to the amount of time it took to crawl the tables. Could probably cut down the number of requests and do some async work to speed it up, though.

hunterowens commented 4 years ago

ah perhaps then better to schedule adn serve teh HTML as a static HTML report