apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.7k stars 4.2k forks source link

[Failing Test]: BigQuery load and copy load exception tests failing with new GCS client #26334

Open BjornPrime opened 1 year ago

BjornPrime commented 1 year ago

What happened?

The apitools-generated StorageV1 client has been replaced with the GCS client. Unfortunately the mocking for test_load_job_exception and test_copy_load_job_exception in bigquery_test.py targets StorageV1's ObjectsService, causing those tests to now fail. Since the new client doesn't have an ObjectsService or a clear analog to it, a new mocking target is not obvious.

Issue Failure

Failure: Test is continually failing

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

tvalentyn commented 1 year ago

What happens if you don't mock ObjectsService? I suspect the purpose of mocking was to silence calls to GCS. You might be able to mock some portions of GCS IO instead.

cc: @Abacn

BjornPrime commented 1 year ago

If we don't mock ObjectsService the tests fail because the mock_load_job isn't called. I've tried to figure out which GCS IO method to mock instead but haven't had any luck getting the tests to pass.

tvalentyn commented 1 year ago

that sounds strange. mock_load_job seems to be a method of bigquery client - do we understand why this method is not called anymore?

If you run the following pipeline (taken from the test):

      _ = (
          p
          | beam.Create([{
              'columnA': 'value1'
          }])
          | WriteToBigQuery(
              table='project:dataset.table',
              schema={
                  'fields': [{
                      'name': 'columnA', 'type': 'STRING', 'mode': 'NULLABLE'
                  }]
              },
              create_disposition='CREATE_NEVER',
              custom_gcs_temp_location="gs://temp_location",
              method='FILE_LOADS'))

is bigquery_v2_client.BigqueryV2.JobsService.Insert still called ? You can insert an assertion failure inside bigquery_v2_client.BigqueryV2.JobsService.Insert to verify.