anelendata / tap-bigquery

Singer.io tap for extracting data from BigQuery tables
Apache License 2.0
16 stars 33 forks source link

Remove or find a workaround to authenticate using SAs #29

Closed AlejandroUPC closed 4 months ago

AlejandroUPC commented 4 months ago

Right now you are forced to have the variable set GOOGLE_APPLICATION_CREDENTIALS as its also set by default with the value $MELTANO_PROJECT_ROOT/client_secrets.json as specified here.

The issue with this if you want to authenticate without using keys (let it be an actual json file or a string representation for it), you can't. The call bigquery.Client seems to check if this variable is set and then it will try to authenticate by checking the file that's defined in the value of the variable as specified here.

You might want to do this if running your application in kubernetes and using service accounts to authenticate, which is somewhat safer as the machine running the code does not need to know of any kind of credentials (this is the particular case stopping us right now), a more detaied explanation by google can be found here, but:

Many Google Cloud services let you attach a service account that can be used to provide credentials for accessing Google Cloud APIs. If ADC does not find credentials it can use in either the GOOGLE_APPLICATION_CREDENTIALS environment variable or the well-known location for local ADC credentials, it uses the metadata server to get credentials for the service where the code is running.

I am not sure what's the deal here, removing the default value might be a breaking change so maybe having an additional flag like AUTHENTICATE_WITH_SA that just removes the variable using os.environ.pop.

Looking for feedback from owner and maintainers, also checking if this repo is active.

AlejandroUPC commented 4 months ago

Had a discussion with some colleague in Slack and there is a solution, you can overwrite the setting but without passing the env key: https://meltano.slack.com/archives/C068YBV6KEK/p1719398735756859