HTTPArchive / data-pipeline

The new HTTP Archive data pipeline built entirely on GCP
Apache License 2.0
5 stars 0 forks source link

Standard library of custom BigQuery functions #118

Open rviscomi opened 2 years ago

rviscomi commented 2 years ago

BigQuery supports dataset-level custom functions that are publicly reusable. This can be a useful way to abstract away boilerplate SQL needed for common routines. Here's an example of a query that generates custom functions for CWV analysis. We could also write functions for other HA-centric use cases like URL manipulation, HTTP header parsing, etc.

We can maintain the functions in GH for source control as SQL files, which also lets the community submit improvements to existing custom functions or submit their own custom function ideas if they're generally useful.

A nice-to-have is a CI process that runs unit tests on the functions and executes their SQL to update them on BigQuery.

max-ostapenko commented 1 week ago

Definitely a MUST after the legacy cleanup.