Open vejeta opened 1 month ago
Describe the bug
We have an ingestion job, running periodically in Kubernetes, it runs fine with DataHub 0.12.x versions.
You can see the memory stays stable under 1GiB during the execution of the job.
However, with DataHub 0.13.x versions it always fail with Error 137, out of memory.
We have tried to increase the memory to 20GiB, but there must be a memory leak, because it alwasys run out of memory.
To Reproduce Steps to reproduce the behavior:
- name: bigquery image: 'acryldata/datahub-ingestion:v0.13.3' imagePullPolicy: Always args: ["ingest", "-c", "/recipes/bq-recipe.yml"]
Note, we have also tried, the latest release from the latest commit from master, and the issue is still present.
Our BigQuery recipe
source: type: bigquery config: project_on_behalf: {{ bq_slots_project }} project_id_pattern: allow: - .*{{ gcp_project }} dataset_pattern: allow: - {{ profile_dataset }} deny: - ^temp_.* - .*_temp$ - .*_temp_.* - .*-temp.* - .*temporary.* use_exported_bigquery_audit_metadata: true bigquery_audit_metadata_datasets: - {{ gcp_project }}.bigquery_audit_log use_date_sharded_audit_log_tables: true upstream_lineage_in_report: true include_usage_statistics: true capture_table_label_as_tag: true capture_dataset_label_as_tag: true extract_column_lineage: true convert_urns_to_lowercase: true profiling: enabled: "true" profile_table_size_limit: null profile_table_row_limit: null use_sampling: false partition_profiling_enabled: false include_field_mean_value: false include_field_median_value: false include_field_sample_values: false include_field_stddev_value: false stateful_ingestion: enabled: true state_provider: type: "datahub" config: datahub_api: server: "http://our-gms:8080" pipeline_name: {{ pipeline }} sink: type: "datahub-rest" config: server: "http://our-gms:8080"
Expected behavior Not having a Out of Memory error in DataHub 0.13.3
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context Logs with some summary after a successful execution with DataHub 0.12.x
Pipeline finished with at least 5 warnings; produced 33546 events in 3 hours, 26 minutes and 30.91 seconds. {} 'pending_requests': 0} 'gms_version': 'v0.13.0', 'total_duration_in_seconds': 12396.68, 'current_time': '2024-08-11 03:26:39.897462 (now)', 'start_time': '2024-08-11 00:00:03.217296 (3 hours, 26 minutes and 36.68 seconds ago)', 'failures': [], 'warnings': [], 'records_written_per_second': 2, {'total_records_written': 33546, Sink (datahub-rest) report: 'running_time': '3 hours, 26 minutes and 30.91 seconds'} 'start_time': '2024-08-11 00:00:08.984329 (3 hours, 26 minutes and 30.91 seconds ago)', 'stateful_usage_ingestion_enabled': True, 'usage_end_time': '2024-08-11 00:00:08.942642+00:00 (3 hours, 26 minutes and 30.95 seconds ago)', 'usage_start_time': '2024-08-10 00:00:00+00:00 (1 day, 3 hours and 26 minutes ago)', 'extra={}', "successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run. '" 'stateful_lineage_ingestion_enabled': "default=True description='Enable stateful lineage ingestion. This will store lineage window timestamps after " 'lineage_end_time': '2024-08-11 00:00:08.942642+00:00 (3 hours, 26 minutes and 30.95 seconds ago)', 'lineage_start_time': '2024-08-10 00:00:00+00:00 (1 day, 3 hours and 26 minutes ago)', ---- 'partition_info': {}, 'sampled': '10 sampled of at most 3059 entries.'}, ----- 'num_usage_parsed_log_entries': {'*': 122613}, 'num_usage_total_log_entries': {'*': 175145}, 'lineage_metadata_entries': {'*': 99}, 'num_lineage_parsed_log_entries': {'*': 108696}, 'num_lineage_total_log_entries': {'*': 108696}, 'num_skipped_lineage_entries_other': {}, 'num_lineage_entries_sql_parser_failure': {'host: 147}, 'num_skipped_lineage_entries_not_allowed': {'host': 90366}, 'num_skipped_lineage_entries_missing_data': {'host': 14177}, 'num_total_lineage_entries': {'host': 108696}, 'audit_log_api_perf': {'get_exported_log_entries': '397.101 seconds', 'list_log_entries': None}, 'total_query_log_entries': 0, 'sampled': '10 sampled of at most 2880 entries.'}, 'num_usage_parsed_log_entries': {'*': 122613}, 'num_usage_total_log_entries': {'*': 175145}, 'lineage_metadata_entries': {'*': 99}, 'num_lineage_parsed_log_entries': {'*': 108696}, 'num_lineage_total_log_entries': {'*': 108696}, 'num_skipped_lineage_entries_other': {}, 'num_lineage_entries_sql_parser_failure': {'host': 147}, 'num_skipped_lineage_entries_not_allowed': {'host': 90366}, 'num_skipped_lineage_entries_missing_data': {'host': 14177}, 'num_total_lineage_entries': {'host': 108696},
Same behaviour in 0.14.x (tested in 0.14.0.1, 0.14.0.2)
Describe the bug
We have an ingestion job, running periodically in Kubernetes, it runs fine with DataHub 0.12.x versions.
You can see the memory stays stable under 1GiB during the execution of the job.
However, with DataHub 0.13.x versions it always fail with Error 137, out of memory.
We have tried to increase the memory to 20GiB, but there must be a memory leak, because it alwasys run out of memory.
To Reproduce Steps to reproduce the behavior:
Note, we have also tried, the latest release from the latest commit from master, and the issue is still present.
Our BigQuery recipe
Expected behavior Not having a Out of Memory error in DataHub 0.13.3
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context Logs with some summary after a successful execution with DataHub 0.12.x