ctdk / goiardi

A Chef server written in Go, able to run entirely in memory, with optional persistence with saving the in-memory data to disk or using MySQL or Postgres as the data storage backend. Docs: http://goiardi.readthedocs.io/en/latest/index.html
http://goiardi.gl
Apache License 2.0
280 stars 39 forks source link

Q: housekeeping of node_statuses/reports/sandboxes #56

Closed rmoriz closed 6 years ago

rmoriz commented 7 years ago

When using postgresql those records won't get cleaned up/purged. For node_statuses and reports goiardi users may have custom requirements to keep old records forever, so this is not an issue - I can add a cron job to delete very old entries.

However I'm not sure about the rows in sandboxes. Can I delete them after a while, too?

Thanks :)

ctdk commented 7 years ago

You know, that's a very good question. There's already a setting to delete log entries, so similar ones for node statuses and reports is totally reasonable (I've even been bit by the reports filling everything up, but like you just set up a cron to clear them out). I'll get that added before sending 0.11.6 out. With sandboxes I'm not quite sure, but I'll check that out too. I have a feeling that they aren't useful for very long at all, but I haven't done much in there for a while.

julian7 commented 7 years ago

👍 I just realized log_infos TOAST size is about 55GB, which just wins the gold medal to reports, which is just 501MB, and to search_items which is a mere 379MB. The node is running for 15 months.

ctdk commented 7 years ago

Urk. I've added some new options for purging that kind of data, but they still need testing. (I've once again had stuff come up that needed dealt with, plus I want to get that cookbook issue mentioned elsewhere out of the way and I've been dragging my feet on writing a proper go test test for it, because creating a cookbook that way is awful.)

I was thinking about this issue again though after seeing this comment (really) for the last few days, and while log_infos needs some pretty serious refactoring I noticed one thing that might help right off the bat. When I wrote that feature I don't think I realized how much extraneous information it would create, especially for people who run chef as a cron, and didn't think about it much afterwards (so thanks for bringing it up). Anyway, along with periodic purging it looks like I maybe should have at the very least set log_infos up to store the data differently - I set the storage type in Postgres to EXTERNAL, but it may have been better to use EXTENDED for this.

If either of you have a table with this data handy, would you mind making a copy (presumably a subset of the data) of the table and see if altering the storage to EXTENDED and see if it makes a difference? I'll try it out too, but it may take a little while to generate some data for it.

julian7 commented 7 years ago

TBH I'm not interested in keeping run results for eternity. What I'm interested is to have the whole infra backed up periodically, and so far goiardi database backup takes the most resources.

Nevertheless, changing it to EXTENDED makes sense.

I've changed log_infos.extended_info to use EXTENDED storage, and I'll give it a week to collect some more data.

ctdk commented 7 years ago

@julian7 This is a little embarrassing, but I went to look at time-based log_infos purging, and realized I had in fact set up an optional argument for purging entries when I originally added the feature (this is what I get for adding things I don't always use). It's -K or --log-event-keep, log-event-keep in the config file, or $GOIARDI_LOG_EVENT_KEEP as an environment variable.

That said, those log_infos entries can still take up way too much space. Using EXTENDED helps a bit, but I'm looking at good ways to make that better.

rmoriz commented 6 years ago

GOIARDI_LOG_EVENT_KEEP does not purge node_statuses and sandboxes for me with a PG-based setup. I also wonder if this can be the reason for the memory "leak" when using the go-based data store.

ctdk commented 6 years ago

The wheels turn slowly, but they've started turning again. I've pushed up another prerelease with a simplified tack at tackling the log info memory usage (optional skipping of recording that information) after backing off of more complicated ideas like storing diffs, at least for now. The node status and report purging's actually being started now too.

It needs more testing, but so far it looks good.

ctdk commented 6 years ago

The node statuses and reports should be dealt with now with the latest release (yay). I haven't been able to get an answer on the sandbox issue yet, though, so for now I'm leaving that be. Closing this out for now, but I'll keep the sandbox cleaning on my mind.

julian7 commented 6 years ago

@ctdk you had more time than I have :) I'll look into this as soon as I can.