Closed Clausewitz45 closed 3 years ago
Hi,
How do other programs pick up the roles from the VM? Would you have an example in Python to show me?
thanks
Hi,
many thanks for the response. I talked with one of our developers, and he came back to me with this two link:
Since I'm only an engineer, I cannot judge if this is enough to start, but I will try to add more examples later.
Thanks
Came here interested in this. When deployed in GCP, the metadata server is interrogated for authentication tokens. Instead of explicitly calling the auth function here, you should check to see if the user specified a key path first, and if they did not, just create the client like this:
self.client = bigquery.Client()
This will use Application Default Credentials when running on a developer's machine, and the metadata server when running in GCP. The key option is still useful for scenarios other than this, like running in AWS or something self-hosted. The implicit setup will also look at GOOLE_APPLICATION_CREDENTIALS environment variable for a key path.
@preston-hf would you be able to create a pull request with this change? I do not use GCP so it's hard for me to test this usecase
thanks
I was just about to start working on this but then I dived into the code and realized @gabfl you support different credentials on a per client basis which makes this way more complicated.
While it is cool- I really don't think you should allow nor support that flow because it goes counter to how google aims to manage credentials.
The GOOGLE_APPLICATION_CREDENTIALS
environment variable which @preston-hf brings up which points to a service account key is really the way to go. If you want to connect to BigQuery tables across multiple projects then you need to give the service account you create BQ access in those other projects.
You can read more about credentials here: https://cloud.google.com/docs/authentication/getting-started
Another reason to update this is that for my use case I'm running a pg container in a GKE cluster and we are using a feature called workload identity to manage access: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#overview
This only works if you rely on google's default authentication mechanism (i.e. not explicitly providing a service account key).
How would you feel about such changes?
Essentially- we would get rid of the fdw_key
option and users would need to make sure the GOOGLE_APPLICATION_CREDENTIALS
variable is set and pointing to a service account access key file readable by postgres.
@shadiramadan
Apologies for the delay in responding.
I tent to agree that people using different values for fdw_key
is very edgy and might not exists. I think the suggested change makes sense.
How would you proceed to have Postgres/Python be able to read that env variable?
Would you be willing to work on a pull request?
Usage of bigquery_fdw should be possible without creating a key at all. They have a bunch of security downsides and should be avoided where possible. I don't think you don't need an environment variable, I believe the client automatically looks for the relevant variables.
Indeed it looks like the client does look for env variables if nothing is set on the application side: https://cloud.google.com/bigquery/docs/authentication/getting-started
I don't use GCP on a daily basis but is the env variable available by default on some GCP instances?
It doesn't actually use environment variables on GCP. The way it works is each VM/cloud function/etc has a link-local "metadata server" available at metadata.google.internal
which resolves to a link-local address. The client libraries make a request to this service to generate auth tokens and discover other info about the runtime environment.
In addition, for developers, you typically setup Application-Default Credentials using the SDK, and the client libraries check a "well-known" path for the ADC token. Basically, for most environments, just calling bigquery.Client()
should just work, if you need to override the defaults to work in a non-GCP prod environment, you can set the environment variables that should be picked up. Unfortunately I'm not using this library at the moment so can't test it out.
I wrote a draft/untested PR here https://github.com/gabfl/bigquery_fdw/pull/20, I will stage it and test it when I get a chance.
If any of you would like to contribute to the testing/finalizing, it would be really helpful
@preston-hf @Clausewitz45 @shadiramadan The PR is ready to be merged, I will finalize some testing over the weekend and merge it. Please let me know if you have some feedback on the changes in the meantime.
Changes to the authentication process are documented here: https://github.com/gabfl/bigquery_fdw/tree/native-creds#authentication
merged and version 1.8 has been relesed
Hi,
I would like to implement your solution (which is great!) but because of CIS Benchmark GCP 1.4 (no user managed service account keys are enabled - so I basically cannot create keys for the service accounts), I can only assign the required roles to the Compute Engine / Virtual Machine. Does this FDW is able to pick up the roles from the VM itself without submitting any key to it?
Thank you for your response in advance.