datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.93k stars 2.94k forks source link

datahub-gms autoscaling #11761

Open 7onn opened 3 weeks ago

7onn commented 3 weeks ago

I'm running DataHub via Helm Chart and during some large ingestion jobs, resources are being consumed so heavily that usage is reaching its limit and it starts throttling until it can't even answer the health check and the pod is terminated.

To fix this, I suppose I could just increase the resource limits and let it fly. But I also think that in moments like this, with heavy traffic, we could benefit from a second GMS instance to share the load. So I ask, would I have any problems running GMS with multiple replicas? Can it support running in parallel? In case it does, I'd be interested on contributing in the Helm Chart to enable autoscaling via HPA :)

david-leifker commented 3 weeks ago

GMS does run with multiple replicas. Additionally, the primary consumers mce-consumer and mae-consumer can run as separate deployments (each of which can be run with multiple replicas).

david-leifker commented 3 weeks ago

I'd be interested in updates for HPA in the helm charts, either GMS and/or the standalone consumer groups!

7onn commented 2 weeks ago

opened a PR for datahub-gms: https://github.com/acryldata/datahub-helm/pull/517/files

i could do for the standalone consumers too, if i get a thumbs up on this approach by some DataHub maintainer,