CityOfLosAngeles / aqueduct

A shared pipeline for building ETLs and batch jobs that we run at the City of LA for Data Science Projects. Built on Apache Airflow & Civis Platform
Apache License 2.0
21 stars 6 forks source link

Data warehouse catalog service #320

Closed ian-r-rose closed 4 years ago

ian-r-rose commented 4 years ago

It would be nice to share a view into the civis data warehouse tables using a service so that external stakeholders could easily inspect it. Not the actual data, but maybe the available tables and some metadata.

ian-r-rose commented 4 years ago

Looked into this yesterday, and I think we might be overthinking it. I had some success creating Panel/ipywidgets-based GUIs that showed the data tables to the user. However they were somewhat annoying to use since they involved several requests to the database backend, each of which had several seconds of overhead. Combine this with any k8s pod startup time, and it winds up being quite some time before the user sees any content. Now, I could probably reduce the number of requests to the backing databases by putting together a more complex query, but I think there would be diminishing returns.

The content is, at the end of the day, a not-very-dynamic view of some tabular data. What if instead of a live service, we scheduled a job that made a nice static HTML file? This could be hosted on gh-pages like anything else. It could even have some light interactivity with offline ipywidgets.

hunterowens commented 4 years ago

yeah, I think that also makes sense. explaining it on the call yesterday made me realize this is much simpler than I thought.