airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.4k stars 3.97k forks source link

Auto delete job history logs after specific period #39358

Open venkateshkelevo opened 2 months ago

venkateshkelevo commented 2 months ago

Topic

Auto delete job history logs after specific period

Relevant information

We have used helm charts(Kubernetes environment) to deploy airbyte platform. Used the flag TEMPORAL_HISTORY_RETENTION_IN_DAYS="7" to auto delete the job history logs after 7 days, but the cleanup is not happening and logs are still available. Can you provide a way how can we achieve this?

helm command: helm install -n foresight-infra airbyte airbyte/airbyte --set global.env_vars.TEMPORAL_HISTORY_RETENTION_IN_DAYS="7"

Thanks.

marcosmarxm commented 2 months ago

@venkateshkelevo this flag will clean temporal database logs. Is there any other logs you want to clean besides those?

venkateshkelevo commented 2 months ago

Our basic problem is airbyte-db-0 pod pvc, used space is getting increased slowly. I assume the job history logs and metadata getting stored in db because of that used space is getting increased, we would like to cleanup the db space automatically.

We configured a connector to pull the data from rest source using cron job which runs every two minutes, attached screenshot for logs which I am referring to.

image

Please suggest a way to auto clean up the db pvc space.

Thanks.

gingeard commented 1 week ago

We have a similar situation: after a week of data processing with only one connection set up, Airbyte's Postgres DB consumed 1GB already:

  1. db-airbyte db:
image
  1. temporal db:
image

If I understand correctly, setting the TEMPORAL_HISTORY_RETENTION_IN_DAYS parameter can help clean up the temporal db, but not the db-airbyte one.