GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
400 stars 114 forks source link

Getting google.api_core.exceptions.BadRequest error while storing and referencing secrets in your connection configuration whose source is MySQL and the target is bigquery #1046

Closed Jyotirm0y closed 1 week ago

Jyotirm0y commented 10 months ago

We tried to use secret-manager for storing and referencing secrets in your connection configuration as mentioned [https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#using-gcp-secret-manager]

cmd

data-validation connections add \
    --secret-manager-type GCP \
    --secret-manager-project-id prj-01\   _# project-id_
    --connection-name bq_log_sa_rev BigQuery \
    --project-id 'gcp-dev-sa'   _##Secret Manager name_

Below is the connection json: {"source_type": "BigQuery", "secret_manager_type": "GCP", "secret_manager_project_id": "prj-01", "project_id": "gcp-dev-sa"}

We are getting the below error while trying row hash validation whose source is MySQL and the target is bigquery :

google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/gcp-dev-sa/jobs?prettyPrint=false: ProjectId and DatasetId must be non-empty.

sundar-mudupalli-work commented 10 months ago

Jyotirmoy,

The most likely explanation is that DVT is not able to find the secret named gcp-dev-sa in the secret manager. Either the secret does not exists or you don't have permission to read it.

The current method of handling secrets is to does not work well for Kubernetes, so we are looking to change this. So if you can wait for a bit, we will have a better solution where we retrieve the whole database connection configuration from the secret manager. When we do that, we will also provide better error messages.

Sundar Mudupalli

Jyotirm0y commented 10 months ago

@sundar-mudupalli-work

The service account already has the Secret Manager Secret Accessor role for secret named 'gcp-dev-sa'. Also we're not using Kubernetes as Dvt is installed inside a linux VM which is already authenticated via above service account. We are successfully able to run row hash validation whose source is MySQL and the target is bigquery with the same connection json without secret manager with the below cmd:

data-validation connections add --connection-name Bq_conn \
BigQuery --project-id prj-01
nehanene15 commented 10 months ago

@Jyotirm0y Can you provide the command you are using for row hash validation using the BQ connection w/secret manager?

nj1973 commented 10 months ago

@Jyotirm0y, if I understand correctly you are using a secret to hide the BigQuery project id. Please update if this is not what you are testing.

Assuming my understanding is correct I tested using a secret for the BigQuery project id:

$ data-validation connections add \
    --secret-manager-type GCP \
    --secret-manager-project-id dbbb-proj-01\
    --connection-name bq_dbbb_secret BigQuery \
    --project-id 'nj-ctop-bq-proj'

$ cat bq_dbbb_secret.connection.json
{"source_type": "BigQuery", "secret_manager_type": "GCP", "secret_manager_project_id": "dbbb-proj-01", "project_id": "nj-ctop-bq-proj"}

And then ran row validation successfully:

data-validation  validate row -sc=pg -tc=bq_dbbb_secret -tbls=dvt_test.tab_num \
--primary-keys=id --hash=col_num_17,col_num_18

Are you also using a secret in the MySQL connection, could that be where the problem lies?

Jyotirm0y commented 10 months ago

@nj1973 Not trying to hide the project ID. Currently we can access the BQ via the authorized service account which is already logged into terminal. However our ideal scenario would be to fetch the SA private key from secret manager while accessing BQ as SA could be different. Not tested anything for MySQL now but going forward we would like to fetch the MySQL or Oracle credentials from secret manager as well.

@nehanene15 data-validation validate row --source-conn CONN_NAME --target-conn bq_log_sa --tables-list testdb.billed_handset_code_sql_new=project_id.test.billed_handset_code_test --primary-keys usi_channel_id --hash '*' --filters 'usi_channel_id<3'

nj1973 commented 10 months ago

An alternative to storing the SA private key would be to have a dedicated GCE VM for DVT and attach the correctly privileged service account to the VM. I think that ticks both the convenience and security boxes.

I have tested using secrets for Oracle username and password in the past so you shouldn't have any trouble there as long as all privileges are correctly configured. I suspect you will get the same error message as in your project id test until you figure out which permissions are missing.

Jyotirm0y commented 10 months ago

@nj1973

  1. We might not get a dedicated VM always, so the preferred option would have been to authenticate with a service account securely.
  2. As per our understanding all permissions are already available, can you please share the exact steps and permissions which you tested, so that we can replicate at our side.
nj1973 commented 10 months ago

The permissions my test user has are:

gcloud secrets add-iam-policy-binding projects/12345/secrets/nj-dvt-pwd \
    --member="serviceAccount:dvt-service-account@my-project.iam.gserviceaccount.com" \
    --role="roles/secretmanager.secretAccessor"

Perhaps the Viewer role is the one that is too permissive. I used that because I am working in a test project.

Jyotirm0y commented 10 months ago

Hi @nj1973 I can understand that for test project Viewer access was fine, however it won't be approved for us in client subscription. So is there any ways to get the full list of IAM roles/permissions

sundar-mudupalli-work commented 10 months ago

Hi folks,

I would suggest a pause to this discussion. I have created a pull request that affects the way we use secrets to store connection configuration. It is a breaking change - the value stored in the secret will need to change.

The old code saved individual keys in the connection configuration in a secret - the new approach is to save the complete connection configuration (user name, password, host, port etc) in the secret. There is also improved exception handling.

If you want to try the new code use this branch - it appears to work and would be happy to receive any bug reports. The specific code changes for exception handling are old here compared to what we want to do new here - around line 48 or so.

Thanks.

Sundar Mudupalli

Jyotirm0y commented 10 months ago

Hi @nj1973, @sundar-mudupalli-work, @nehanene15 , I have created a new virtual env where I cloned the mentioned branch and installed DVT.

git branch -a
* K8-indexed

google-pso-data-validator 4.2.0

But, when I am trying to creation to target biquery: data-validation connections add --connection-name Bq_conn BigQuery --project-id prjoject01

I am getting the below error:

Traceback (most recent call last):
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/bin/data-validation", line 33, in <module>
    sys.exit(load_entry_point('google-pso-data-validator==4.2.0', 'console_scripts', 'data-validation')())
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/bin/data-validation", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/local/lib/python3.9/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/data_validation/__main__.py", line 23, in <module>
    from data_validation import (
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/data_validation/cli_tools.py", line 55, in <module>
    from data_validation import clients, consts, state_manager
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/data_validation/clients.py", line 23, in <module>
    from google.cloud import bigquery
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module>
    from google.cloud.bigquery.client import Client
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 56, in <module>
    from google.resumable_media.requests import MultipartUpload  # type: ignore
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/resumable_media/requests/__init__.py", line 665, in <module>
    from google.resumable_media.requests.download import ChunkedDownload
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/resumable_media/requests/download.py", line 17, in <module>
    import urllib3.response  # type: ignore
  File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/urllib3/__init__.py", line 41, in <module>
    raise ImportError(
` ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips  26 Jan 2017'.`
nehanene15 commented 8 months ago

@Jyotirm0y The K8-indexed branch has now been merged - can you pull the latest code and see if you're still having any issues? You should be able to use Secret Manager to store BQ and Oracle credentials.

helensilva14 commented 1 week ago

Closing question-labeled issue due to long period without responses.