GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
397 stars 112 forks source link

Filter feature is not working on source: MySQL and the target: bigquery #1041

Closed Jyotirm0y closed 10 months ago

Jyotirm0y commented 10 months ago

``When trying to use the filters feature, whose source is MySQL and the destination is bigquery, the execution fails. With an exception from ibis/backends/base_sqlalchemy/compiler.py: Value error: ibis.common.exceptions.OperationNotDefinedError: No translation rule for <class 'third_party.ibis.ibis_addon.operations.RawSQL'>

error log

Traceback (most recent call last):
  File "/home/cloudbreak/python3.9_venv/bin/data-validation", line 8, in <module>
    sys.exit(main())
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/__main__.py", line 501, in main
    run_validation_configs(args)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/__main__.py", line 471, in run_validation_configs
    run_validations(args, config_managers)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/__main__.py", line 424, in run_validations
    run_validation(config_manager, verbose=args.verbose)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/__main__.py", line 413, in run_validation
    validator.execute()
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/data_validation.py", line 89, in execute
    result_df = self.execute_recursive_validation(
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/data_validation.py", line 218, in execute_recursive_validation
    self._execute_validation(
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/data_validation/data_validation.py", line 310, in _execute_validation
    source_df = self.config_manager.source_client.execute(source_query)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/client.py", line 220, in execute
    query = self._get_query(query_ast, **kwargs)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/client.py", line 226, in _get_query
    return self.query_class(self, dml, **kwargs)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/client.py", line 40, in __init__
    self.compiled_sql = sql.compile()
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 55, in compile
    compiled_queries = [q.compile() for q in self.queries]
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 55, in <listcomp>
    compiled_queries = [q.compile() for q in self.queries]
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/alchemy.py", line 1333, in compile
    frag = step(frag)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/alchemy.py", line 1420, in _add_where
    args = [
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/alchemy.py", line 1421, in <listcomp>
    self._translate(pred, permit_subquery=True) for pred in self.where
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1598, in _translate
    return translator.get_result()
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1362, in get_result
    translated = self.translate(self.expr)
  File "/home/cloudbreak/python3.9_venv/lib/python3.9/site-packages/ibis/backends/base_sqlalchemy/compiler.py", line 1403, in translate
    raise com.OperationNotDefinedError(
ibis.common.exceptions.OperationNotDefinedError: No translation rule for <class 'third_party.ibis.ibis_addon.operations.RawSQL'>

We tried to use the filter with both approaches passing it as parameter and also passing it explicitly on a YAML file but got the above error for both the cases.

  1. When passing it as parameter: data-validation validate row --source-conn CONN_NAME --target-conn bq_conn --tables-list testdb.billed_handset_code_sql_new=$projectid.dbbqq.billed_handset_code_test --comparison-fields usi --primary-keys usi_channel_id --filters 'usi_channel_id=2'

  2. When passing it explicitly on a YAML file

result_handler: {}
  source: CONN_NAME
  target: bq_conn
  validations:
  - calculated_fields: []
    comparison_fields:
    - cast: null
      field_alias: usi
      source_column: usi
      target_column: usi
    dependent_aliases:
    - usi
    - usi_channel_id
    filters:
    - source: usi_channel_id=2
      target: usi_channel_id=2
      type: custom
    format: table
    labels: []
    primary_keys:
    - cast: null
      field_alias: usi_channel_id
      source_column: usi_channel_id
      target_column: usi_channel_id
    random_row_batch_size: null
    schema_name: testdb
    table_name: billed_handset_code_sql_new
    target_schema_name: projectid.dbbqq
    target_table_name: billed_handset_code_test
    threshold: 0.0
    type: Row
    use_random_rows: false

But when I try to use the filter parameter whose source is bigquery and the destination is bigquery it works properly:

  1. When passing it as parameter : data-validation validate row --source-conn bq_conn --target-conn bq_conn --tables-list projectid.dbbqq.billed_handset_code_test_dummy=projectid.dbbqq.billed_handset_code_test --comparison-fields usi --primary-keys usi_channel_id --filters 'usi_channel_id=2'
  2. When passing it explicitly on a YAML file:
    result_handler: {}
    source: bq_conn
    target: bq_conn
    validations:
    - calculated_fields: []
    comparison_fields:
    - cast: null
    field_alias: usi
    source_column: usi
    target_column: usi
    dependent_aliases:
    - usi
    - usi_channel_id
    filters:
    - source: usi_channel_id=2
    target: usi_channel_id=2
    type: custom
    format: table
    labels: []
    primary_keys:
    - cast: null
    field_alias: usi_channel_id
    source_column: usi_channel_id
    target_column: usi_channel_id
    random_row_batch_size: null
    schema_name: projectid.dbbqq
    table_name: billed_handset_code_test_dummy
    target_schema_name: projectid.dbbqq
    target_table_name: billed_handset_code_test
    threshold: 0.0
    type: Row
    use_random_rows: false
nehanene15 commented 10 months ago

Could you update to the latest version (4.2.0) and retry this?

nj1973 commented 10 months ago

@Jyotirm0y, I believe upgrading resolved this issue, is that correct?