OOM (Out of Memory Errors) in DataHub 0.13+ when ingesting Bigquery metadata

Describe the bug

We have an ingestion job, running periodically in Kubernetes, it runs fine with DataHub 0.12.x versions.

You can see the memory stays stable under 1GiB during the execution of the job.

However, with DataHub 0.13.x versions it always fail with Error 137, out of memory.

We have tried to increase the memory to 20GiB, but there must be a memory leak, because it alwasys run out of memory.

To Reproduce Steps to reproduce the behavior:

Kubernetes CronJob set up
- 2Gi memory
- 2 CPU

Our deployment:

        - name: bigquery
          image: 'acryldata/datahub-ingestion:v0.13.3'
          imagePullPolicy: Always
          args: ["ingest", "-c", "/recipes/bq-recipe.yml"]

Note, we have also tried, the latest release from the latest commit from master, and the issue is still present.

Our BigQuery recipe

        source:
          type: bigquery
          config:
            project_on_behalf: {{ bq_slots_project }}
            project_id_pattern:
              allow:
                - .*{{ gcp_project }}
            dataset_pattern:
              allow:
                - {{ profile_dataset }}
              deny:
                - ^temp_.*
                - .*_temp$
                - .*_temp_.*
                - .*-temp.*
                - .*temporary.*

            use_exported_bigquery_audit_metadata: true
            bigquery_audit_metadata_datasets:
              - {{ gcp_project }}.bigquery_audit_log
            use_date_sharded_audit_log_tables: true
            upstream_lineage_in_report: true

            include_usage_statistics: true

            capture_table_label_as_tag: true
            capture_dataset_label_as_tag: true
            extract_column_lineage: true
            convert_urns_to_lowercase: true

            profiling:
              enabled: "true"
              profile_table_size_limit: null
              profile_table_row_limit: null

              use_sampling: false
              partition_profiling_enabled: false
              include_field_mean_value: false
              include_field_median_value: false
              include_field_sample_values: false
              include_field_stddev_value: false

            stateful_ingestion:
              enabled: true
              state_provider:
                type: "datahub"
                config:
                  datahub_api:
                    server: "http://our-gms:8080"

        pipeline_name: {{ pipeline }}
        sink:
          type: "datahub-rest"
          config:
            server: "http://our-gms:8080"

Expected behavior Not having a Out of Memory error in DataHub 0.13.3

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Google Cloud GKE/Kubernetes
Version: DataHub 0.13.x

Additional context Logs with some summary after a successful execution with DataHub 0.12.x

Pipeline finished with at least 5 warnings; produced 33546 events in 3 hours, 26 minutes and 30.91 seconds.
{}
 'pending_requests': 0}
 'gms_version': 'v0.13.0',
 'total_duration_in_seconds': 12396.68,
 'current_time': '2024-08-11 03:26:39.897462 (now)',
 'start_time': '2024-08-11 00:00:03.217296 (3 hours, 26 minutes and 36.68 seconds ago)',
 'failures': [],
 'warnings': [],
 'records_written_per_second': 2,
{'total_records_written': 33546,
Sink (datahub-rest) report:
 'running_time': '3 hours, 26 minutes and 30.91 seconds'}
 'start_time': '2024-08-11 00:00:08.984329 (3 hours, 26 minutes and 30.91 seconds ago)',
 'stateful_usage_ingestion_enabled': True,
 'usage_end_time': '2024-08-11 00:00:08.942642+00:00 (3 hours, 26 minutes and 30.95 seconds ago)',
 'usage_start_time': '2024-08-10 00:00:00+00:00 (1 day, 3 hours and 26 minutes ago)',
                                       'extra={}',
                                       "successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run. '"
 'stateful_lineage_ingestion_enabled': "default=True description='Enable stateful lineage ingestion. This will store lineage window timestamps after "
 'lineage_end_time': '2024-08-11 00:00:08.942642+00:00 (3 hours, 26 minutes and 30.95 seconds ago)',
 'lineage_start_time': '2024-08-10 00:00:00+00:00 (1 day, 3 hours and 26 minutes ago)',
----
 'partition_info': {},
                      'sampled': '10 sampled of at most 3059 entries.'},
-----
 'num_usage_parsed_log_entries': {'*': 122613},
 'num_usage_total_log_entries': {'*': 175145},
 'lineage_metadata_entries': {'*': 99},
 'num_lineage_parsed_log_entries': {'*': 108696},
 'num_lineage_total_log_entries': {'*': 108696},
 'num_skipped_lineage_entries_other': {},
 'num_lineage_entries_sql_parser_failure': {'host: 147},
 'num_skipped_lineage_entries_not_allowed': {'host': 90366},
 'num_skipped_lineage_entries_missing_data': {'host': 14177},
 'num_total_lineage_entries': {'host': 108696},
 'audit_log_api_perf': {'get_exported_log_entries': '397.101 seconds', 'list_log_entries': None},
 'total_query_log_entries': 0,
 'sampled': '10 sampled of at most 2880 entries.'},
 'num_usage_parsed_log_entries': {'*': 122613},
 'num_usage_total_log_entries': {'*': 175145},
 'lineage_metadata_entries': {'*': 99},
 'num_lineage_parsed_log_entries': {'*': 108696},
 'num_lineage_total_log_entries': {'*': 108696},
 'num_skipped_lineage_entries_other': {},
 'num_lineage_entries_sql_parser_failure': {'host': 147},
 'num_skipped_lineage_entries_not_allowed': {'host': 90366},
 'num_skipped_lineage_entries_missing_data': {'host': 14177},
 'num_total_lineage_entries': {'host': 108696},

datahub-project / datahub

OOM (Out of Memory Errors) in DataHub 0.13+ when ingesting Bigquery metadata #11147