getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.74k stars 1.75k forks source link

Sentry database (postgresql) nodestore_node table size is growing huge. #2887

Closed amitesh2181763 closed 5 months ago

amitesh2181763 commented 6 months ago

Body

Hi Team,

From our initial Sentry setup, this "nodestore_node" table size is growing, and as per the sentry support side, we provided sentry database table (nodestore_node) table cleanup, which we are performing till now.

Cleanup procedure: 1) Deleting greater than 30 days of data from table "nodestore_node". 2) Use the pg_repack command to vacuum space.

Initial, we have seen our sentry database grow every day (5–6 GB), but now, in the last few months, we have seen the database size grow every day (10–12 GB), and due to this, we need to frequently do the cleanup to stop our application.

Can you please suggest if there is any other option apart from database cleanup to optimize our database and any way to set a limit on the amount of event data that is stored in our "nodestore_node"?

Also, what type of data is stored in this table, "nodestore_node," and is there any way to separate this data from the table., "nodestore_node"?

amitesh2181763 commented 6 months ago

@hubertdeng123 @azaslavsky @BYK need your valuable suggestion in this case.

BYK commented 6 months ago

As far as I'm aware this table holds the raw event data. I've seen an S3 backend for this on GitHub somewhere but don't know how well it works.

Sentry.io itself uses Google BigTable for this (at least it used to) which makes it much less of a problem (except maybe a money pit).

Why is vacuuming not working for you?

praseeb commented 6 months ago

@BYK Thanks for the information, raw event data means it's kind of audit logs for each and every activity which is performed in the sentry application through API, User activity, every log analyses etc.. is it? Just trying to elaborate what does "raw event data" means. Coming to the Vacuuming, yes that's working for us, but just checking if we can avoid performing this on weekly basis, since it's required a stop & start of the Application.

@amitesh2181763

praseeb commented 6 months ago

@BYK Also to reduce the incoming flow of data to this table, can we do the following?

turn off organizations:performance-view basically remove this entry from sentry.conf.py file? Will this help us in stopping the all-event data to store in that table?

hubertdeng123 commented 6 months ago

How much your database grows per day really depends on how many events your Sentry instance is ingesting, which seems like a lot. What does your event volume look like? If you'd like to reduce incoming flow of data to the table, it may be useful to reduce the sample rate if you're sampling transactions, or rate limit the number of events that come in.

amitesh2181763 commented 6 months ago

Hi @hubertdeng123 Thank you for your valuable response and our sentry PROD database growing daily (10–12 GB), our table size after last cleanup (250 GB), and event volume where we can check and where we have to define sample rate limits for transactions and events to reduce the incoming flow of data into the table. Also, one more question, which error messages are sent by different project to our sentry application, that data is also stored in this table?

hubertdeng123 commented 6 months ago

This page may be helpful to you in regards to rate limiting: https://develop.sentry.dev/services/quotas/

Yes, events are stored in this table.

amitesh2181763 commented 6 months ago

Hi @hubertdeng123 Thank you for the update and suggested rate limit in the config.yml file in the document. It will help us to restrict grow our nodestore_node table. This quote mentioned in the document basically states that the states that the Redis level needs to be set and which parameter is basically mandatory to set in our config.yml file. 

hubertdeng123 commented 6 months ago

No problem. Please let me know if that is the solution to the issue you're facing and if there is additional help that you may require.

getsantry[bot] commented 5 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀