Closed rmoriz closed 6 years ago
You know, that's a very good question. There's already a setting to delete log entries, so similar ones for node statuses and reports is totally reasonable (I've even been bit by the reports filling everything up, but like you just set up a cron to clear them out). I'll get that added before sending 0.11.6 out. With sandboxes I'm not quite sure, but I'll check that out too. I have a feeling that they aren't useful for very long at all, but I haven't done much in there for a while.
👍 I just realized log_infos TOAST size is about 55GB, which just wins the gold medal to reports, which is just 501MB, and to search_items which is a mere 379MB. The node is running for 15 months.
Urk. I've added some new options for purging that kind of data, but they still need testing. (I've once again had stuff come up that needed dealt with, plus I want to get that cookbook issue mentioned elsewhere out of the way and I've been dragging my feet on writing a proper go test
test for it, because creating a cookbook that way is awful.)
I was thinking about this issue again though after seeing this comment (really) for the last few days, and while log_infos
needs some pretty serious refactoring I noticed one thing that might help right off the bat. When I wrote that feature I don't think I realized how much extraneous information it would create, especially for people who run chef as a cron, and didn't think about it much afterwards (so thanks for bringing it up). Anyway, along with periodic purging it looks like I maybe should have at the very least set log_infos
up to store the data differently - I set the storage type in Postgres to EXTERNAL
, but it may have been better to use EXTENDED
for this.
If either of you have a table with this data handy, would you mind making a copy (presumably a subset of the data) of the table and see if altering the storage to EXTENDED
and see if it makes a difference? I'll try it out too, but it may take a little while to generate some data for it.
TBH I'm not interested in keeping run results for eternity. What I'm interested is to have the whole infra backed up periodically, and so far goiardi database backup takes the most resources.
Nevertheless, changing it to EXTENDED makes sense.
I've changed log_infos.extended_info to use EXTENDED storage, and I'll give it a week to collect some more data.
@julian7 This is a little embarrassing, but I went to look at time-based log_infos purging, and realized I had in fact set up an optional argument for purging entries when I originally added the feature (this is what I get for adding things I don't always use). It's -K
or --log-event-keep
, log-event-keep
in the config file, or $GOIARDI_LOG_EVENT_KEEP
as an environment variable.
That said, those log_infos entries can still take up way too much space. Using EXTENDED helps a bit, but I'm looking at good ways to make that better.
GOIARDI_LOG_EVENT_KEEP
does not purge node_statuses
and sandboxes
for me with a PG-based setup. I also wonder if this can be the reason for the memory "leak" when using the go-based data store.
The wheels turn slowly, but they've started turning again. I've pushed up another prerelease with a simplified tack at tackling the log info memory usage (optional skipping of recording that information) after backing off of more complicated ideas like storing diffs, at least for now. The node status and report purging's actually being started now too.
It needs more testing, but so far it looks good.
The node statuses and reports should be dealt with now with the latest release (yay). I haven't been able to get an answer on the sandbox issue yet, though, so for now I'm leaving that be. Closing this out for now, but I'll keep the sandbox cleaning on my mind.
@ctdk you had more time than I have :) I'll look into this as soon as I can.
When using postgresql those records won't get cleaned up/purged. For
node_statuses
andreports
goiardi users may have custom requirements to keep old records forever, so this is not an issue - I can add a cron job to delete very old entries.However I'm not sure about the rows in
sandboxes
. Can I delete them after a while, too?Thanks :)