Closed Jyotirm0y closed 1 week ago
Jyotirmoy,
The most likely explanation is that DVT is not able to find the secret named gcp-dev-sa
in the secret manager. Either the secret does not exists or you don't have permission to read it.
The current method of handling secrets is to does not work well for Kubernetes, so we are looking to change this. So if you can wait for a bit, we will have a better solution where we retrieve the whole database connection configuration from the secret manager. When we do that, we will also provide better error messages.
Sundar Mudupalli
@sundar-mudupalli-work
The service account already has the Secret Manager Secret Accessor role for secret named 'gcp-dev-sa'. Also we're not using Kubernetes as Dvt is installed inside a linux VM which is already authenticated via above service account. We are successfully able to run row hash validation whose source is MySQL and the target is bigquery with the same connection json without secret manager with the below cmd:
data-validation connections add --connection-name Bq_conn \
BigQuery --project-id prj-01
@Jyotirm0y Can you provide the command you are using for row hash validation using the BQ connection w/secret manager?
@Jyotirm0y, if I understand correctly you are using a secret to hide the BigQuery project id. Please update if this is not what you are testing.
Assuming my understanding is correct I tested using a secret for the BigQuery project id:
$ data-validation connections add \
--secret-manager-type GCP \
--secret-manager-project-id dbbb-proj-01\
--connection-name bq_dbbb_secret BigQuery \
--project-id 'nj-ctop-bq-proj'
$ cat bq_dbbb_secret.connection.json
{"source_type": "BigQuery", "secret_manager_type": "GCP", "secret_manager_project_id": "dbbb-proj-01", "project_id": "nj-ctop-bq-proj"}
And then ran row validation successfully:
data-validation validate row -sc=pg -tc=bq_dbbb_secret -tbls=dvt_test.tab_num \
--primary-keys=id --hash=col_num_17,col_num_18
Are you also using a secret in the MySQL connection, could that be where the problem lies?
@nj1973 Not trying to hide the project ID. Currently we can access the BQ via the authorized service account which is already logged into terminal. However our ideal scenario would be to fetch the SA private key from secret manager while accessing BQ as SA could be different. Not tested anything for MySQL now but going forward we would like to fetch the MySQL or Oracle credentials from secret manager as well.
@nehanene15
data-validation validate row --source-conn CONN_NAME --target-conn bq_log_sa --tables-list testdb.billed_handset_code_sql_new=project_id.test.billed_handset_code_test --primary-keys usi_channel_id --hash '*' --filters 'usi_channel_id<3'
An alternative to storing the SA private key would be to have a dedicated GCE VM for DVT and attach the correctly privileged service account to the VM. I think that ticks both the convenience and security boxes.
I have tested using secrets for Oracle username and password in the past so you shouldn't have any trouble there as long as all privileges are correctly configured. I suspect you will get the same error message as in your project id test until you figure out which permissions are missing.
@nj1973
The permissions my test user has are:
Viewer
roles/secretmanager.secretAccessor
for the specific secret I created, example command I used for this:gcloud secrets add-iam-policy-binding projects/12345/secrets/nj-dvt-pwd \
--member="serviceAccount:dvt-service-account@my-project.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Perhaps the Viewer role is the one that is too permissive. I used that because I am working in a test project.
Hi @nj1973 I can understand that for test project Viewer access was fine, however it won't be approved for us in client subscription. So is there any ways to get the full list of IAM roles/permissions
Hi folks,
I would suggest a pause to this discussion. I have created a pull request that affects the way we use secrets to store connection configuration. It is a breaking change - the value stored in the secret will need to change.
The old code saved individual keys in the connection configuration in a secret - the new approach is to save the complete connection configuration (user name, password, host, port etc) in the secret. There is also improved exception handling.
If you want to try the new code use this branch - it appears to work and would be happy to receive any bug reports. The specific code changes for exception handling are old here compared to what we want to do new here - around line 48 or so.
Thanks.
Sundar Mudupalli
Hi @nj1973, @sundar-mudupalli-work, @nehanene15 , I have created a new virtual env where I cloned the mentioned branch and installed DVT.
git branch -a
* K8-indexed
google-pso-data-validator 4.2.0
But, when I am trying to creation to target biquery:
data-validation connections add --connection-name Bq_conn BigQuery --project-id prjoject01
I am getting the below error:
Traceback (most recent call last):
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/bin/data-validation", line 33, in <module>
sys.exit(load_entry_point('google-pso-data-validator==4.2.0', 'console_scripts', 'data-validation')())
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/bin/data-validation", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/local/lib/python3.9/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/data_validation/__main__.py", line 23, in <module>
from data_validation import (
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/data_validation/cli_tools.py", line 55, in <module>
from data_validation import clients, consts, state_manager
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/data_validation/clients.py", line 23, in <module>
from google.cloud import bigquery
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module>
from google.cloud.bigquery.client import Client
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 56, in <module>
from google.resumable_media.requests import MultipartUpload # type: ignore
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/resumable_media/requests/__init__.py", line 665, in <module>
from google.resumable_media.requests.download import ChunkedDownload
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/google/resumable_media/requests/download.py", line 17, in <module>
import urllib3.response # type: ignore
File "/home/cloudbreak/dvt_clone/professional-services-data-validator/lib/python3.9/site-packages/urllib3/__init__.py", line 41, in <module>
raise ImportError(
` ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips 26 Jan 2017'.`
@Jyotirm0y The K8-indexed branch has now been merged - can you pull the latest code and see if you're still having any issues? You should be able to use Secret Manager to store BQ and Oracle credentials.
Closing question-labeled issue due to long period without responses.
We tried to use secret-manager for storing and referencing secrets in your connection configuration as mentioned [https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#using-gcp-secret-manager]
cmd
Below is the connection json: {"source_type": "BigQuery", "secret_manager_type": "GCP", "secret_manager_project_id": "prj-01", "project_id": "gcp-dev-sa"}
We are getting the below error while trying row hash validation whose source is MySQL and the target is bigquery :
google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/gcp-dev-sa/jobs?prettyPrint=false: ProjectId and DatasetId must be non-empty.