PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.99k stars 219 forks source link

BigQuery integration tests #872

Open max-sixty opened 2 years ago

max-sixty commented 2 years ago

It would be great to have BQ integration tests, given how great BigQuery is, and some of the issues we're facing, like the https://github.com/prql/prql/issues/852 saga

Some questions to consider:

I used to work with @tswast on some BigQuery open-source work, and he built a lot of infra like this for the python data ecosystem. If he has any insight here that would be awesome (but no stress if you don't see this / don't have the capacity to respond, thank you!)

tswast commented 2 years ago

I guess we need an auth token to do actual queries? I'm fine if there's some cost associated with the queries in CI; it would be great if it's easy to set up for people locally, probably using their own account. IIUC BQ has a decent amount of free quota per account.

Yes, you'll need a GCP account. Thankfully, there is a free tier, so you might even be able to get away with just using that without creating a billing account. https://cloud.google.com/bigquery/docs/sandbox

Not sure what service you are using for CI/CD, but you might be able to avoid creating a key file by using Workload Identity Federation. See this project for GCP auth on GitHub Actions: https://github.com/google-github-actions/auth#setup

For local auth, you should be able to use gcloud auth application-default login https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login though I'm not sure on the support for this. Alternatively, you could try and replicate some of the logic from https://github.com/pydata/pydata-google-auth for working with user credentials.

What's the best way of using BQ from rust? It's very easy from python. We could have the test suite in python if needed, but it would be better to keep in a single crate.

I recommend checking out https://github.com/mozilla-services/google-cloud-rust, which Google and Mozilla built in partnership. Unfortunately, it's unlikely that an autogenerated client for BigQuery will be particularly useful for running queries on its own (if at all possible, since the core API is REST not gRPC), but there is an open issue at https://github.com/mozilla-services/google-cloud-rust/issues/25 which could be contributed to.

max-sixty commented 2 years ago

Awesome, thanks a lot @tswast ! I'll check those out. Sounds like starting off with python integration tests may be an easier initial path for the moment.