Closed chipzzz closed 1 month ago
@chipzzz thank you for the detailed report!
Also, when not sending in any data I see the following:
From the logs, it looks like Relay does receive data. Is there another source sending events to the same Relay instance which might cause the rate limit being hit?
I can't pin point if it's rate limiting, sampling or something else.
Given the rate limiting logs I suspect it's rate limiting. You can check this by looking at stats (this page should work in self-hosted as well).
Also is dynamic sampling playing a role here? is that enabled in self-hosted sentry? I don't see settings for it as specified by these docs.
No, dynamic sampling applies to transaction events, not to error events.
Shouldn't
sample_rate: 1.0
prevent that if set on client side?
No, the client-side sample rate only determines client side sampling, not server-side sampling.
Is it possible at all to fully disable dynamic sampling for those who fully want all events come through and are willing to provide the required resources for that kind of volume?
Yes, that should be possible by disabling organizations:dynamic-sampling
. But my hunch is that DS is not the problem here.
How does redis play a role in sampling?
We use redis to propagate project configuration to relay, and redis counters to enforce rate limits consistently. For dynamic sampling, we only use it for reservoir sampling.
@jjbayer , thanks for all this detailed info, will be looking through it. Just 1 question at this time, if project rate limiting is not set, the only other possible rate limiting is set through system.rate-limit
correct? Are there any other ones?
I've originally had this set to 0, now playing around with some number but doesn't really make the rate limiting events go away.
The stats page does not show anything being dropped. I've noticed in other issues that the stats page may not always be accurate.
Besides mine, there's no other things being sent in this environment- I suspect it's some internal things being sent perhaps? - will need to verify.
I've also started looking on the sdk side, some reported issues are relevant to mine
https://github.com/getsentry/sentry-java/issues/3494 https://github.com/getsentry/sentry-python/issues/2617
Also, it looks like in my helm chart self-hosted sentry organizations:dynamic-sampling
is not present as a feature. I think I can safely ignore it. Also saw it was tried to be implemented in self-hosted but there was missing components.
@chipzzz
if project rate limiting is not set, the only other possible rate limiting is set through system.rate-limit correct? Are there any other ones?
There's a bunch of sources for rate limits. With superuser access you should be able to check
http://{YOUR_SENTRY_INSTANCE}/api/0/internal/project-config/?projectId={YOUR_PROJECT_ID}
and look for "quotas"
to see which ones are active for your project.
@jjbayer , i don't think I can access that due to https://github.com/getsentry/sentry-docs/issues/7778 ?
Any other way to access this info? Where would I find quotas?
Created https://github.com/getsentry/self-hosted/issues/3235 as I have more findings given that the Since issue began
accumulated count of events is correct, yet Total in last 14 days
is incorrect and viewing in discover tiemchart as well as events list- there are events missing and also being overwritten it seems as past events are no longer present while they were before.
This was caused Snuba's helm chart default setting of single_node: true for Clickhouse. Clickhouse by default is deployed as a distributed cluster.
Thus when snuba was querying / inserting data into Clickhouse, it was under the impression that it was a single node deployment.
Not apparent at high volume.
https://github.com/sentry-kubernetes/charts/blob/develop/charts/sentry/values.yaml#L1782
I'm trying to debug low volume ingestion through relay > sentry and hence missing data in sentry. Every time I push some data in i.e using the python sdk I rarely get any events in sentry, just doing basic sentry_sdk.init:
Sentry Version (helm chart): 24.5.1
Relay Version: getsentry/relay:24.5.1
Current config(also tried basic config without these changes):
I get the following error:
When not sending any data I keep getting rate limit errors, which I originally thought were related - not so sure anymore.
Also, when not sending in any data I see the following:
I'm trying to understand if these errors are something to be looking into as i'm trying to diagnose why I can't see the events come through Sentry, or they do very rarely at low volume.
I can't pin point if it's rate limiting, sampling or something else.
In a high volume environment events seem to come through fine however i'm still seeing constant
dropped envelope: rate limited
making me believe rate limiting is happening. I have rate limiting set to unlimited and I am not sampling on client side. I'm trying to understand what is going on, why I'm rarely seeing events come through sentry at low volume and if I'm actually rate limiting events at high volume( in my other environment.)Also is dynamic sampling playing a role here? is that enabled in self-hosted sentry? I don't see settings for it as specified by these docs.
Is it possible that dynamic sampling is sampling/rate limiting data here? Shouldn't
sample_rate: 1.0
prevent that if set on client side? It makes me believe there may be some dynamic sampling functionality happening, however, since I use self-hosted its control features are not fully plugged in.Is it possible at all to fully disable dynamic sampling for those who fully want all events come through and are willing to provide the required resources for that kind of volume?
How does redis play a role in sampling?
Thank you
Sebastian