cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.45k stars 792 forks source link

Is argument -validation.reject-old-samples.max-age valid for distributor or ingester or both? #6220

Open mousimin opened 6 days ago

mousimin commented 6 days ago

Describe the bug I am using micro services mode for cortex, for distributor, I am using -validation.reject-old-samples & -validation.reject-old-samples.max-age arguments, but for ingester, I am not using them.

  -validation.reject-old-samples=true \
  -validation.reject-old-samples.max-age=28d

I am using VictoriaMetrics to remote write the metrics to cortex, I got an "too old sample" error which I guess comes from ingester: {"ts":"2024-09-18T03:21:11.282Z","level":"error","caller":"VictoriaMetrics/app/vmagent/remotewrite/client.go:400","msg":"sending a block with size 1680 bytes to \"1:secret-url\" was rejected (skipping the block): status code 400; response body: maxFailure (quorum) on a given error family, rpc error: code = Code(400) desc = addr=15.132.24.168:2012 state=ACTIVE zone=zone1, rpc error: code = Code(400) desc = user=user1: err: too old sample. timestamp=2024-09-13T09:08:53.521Z, series={...}"}

Checked the code, I think distributor will validate the max-age value: https://github.com/cortexproject/cortex/blob/master/pkg/distributor/distributor.go#L557

But for ingester, I know it will mainly reuse the code of Prometheus, but I didn't find any code which will do the same validation for distributor, for instead I found there is a argument -ingester.out-of-order-time-window which I specified for ingester with value 6h.

Should I need to add arguments -validation.reject-old-samples.max-age for ingester as well? or should I use the same value for -ingester.out-of-order-time-window with -validation.reject-old-samples.max-age? or this is a bug?

To Reproduce Steps to reproduce the behavior:

  1. Start Cortex v1.17.1 with micro services mode, configure -validation.reject-old-samples & -validation.reject-old-samples.max-age only for distributor
  2. Start a VictoriaMetrics or Prometheus to remote write some old metrics.

Expected behavior No error should appear

Environment:

CharlieTLe commented 6 days ago

All components should typically have the same flags and values be set. Perhaps the only time they should be different is if there's availability zone awareness enabled and you want to set which availability zone a component is in.

mousimin commented 5 days ago

Thanks for your reply @CharlieTLe, so take the limits_config for example, if we check the name convention of "CLI flag" for each item, we have those prefixes:

So, for the ones start with distributor or ingester, they are only valid for distributor or ingester, but for the ones start with validation, they are valid for all components, am I right?

CharlieTLe commented 4 days ago

Yes, they are valid for all components.

mousimin commented 1 day ago

OK, go back to the issue I got, I found the default value of -validation.reject-old-samples is false, so we shouldn't get the "too old sample" error from ingester, right? Any flags I missed?

yeya24 commented 1 day ago

reject-old-samples flag is valid for all components but the validation is only done at distributor. If you got too old samples error at Ingester, that means you are trying to inject a sample to TSDB that is older than the TSDB min time.

If you want to tolerate this error, you need to enable out_of_order_time_window for Ingester.