Closed archgrove closed 6 years ago
@jpluscplusm has pointed me at https://bosh.io/jobs/credhub?source=github.com/pivotal-cf/credhub-release&version=1.7.2#p=credhub.log_level, which seems likely to be useful.
Hi Adam, thanks for raising this issue. We've also seen one of our own deployments suffer a Postgres storage issue and we're looking into it. There are also some other issues relating to 3.9.0 that are causing some headaches. Please stay tuned!
Thanks @danyoung ! I've done some more Credhub level splunking, and it looks like the audit logs are non-negotiable (they can't be turned off, or turned down). So in some ways, this feels like a credhub "issue".
In terms of concourse-up
, a post-insert TRIGGER
to GC old audit entries might be viable? Or if we're feeling feisty, an actual cron job for cleanup?
@archgrove we have a bug to address this issue in concourse-up https://www.pivotaltracker.com/story/show/155592413
@archgrove Please try the latest release for a fix to this issue https://github.com/EngineerBetter/concourse-up/releases/tag/0.8.3
We use concourse-up to manage our Concourse. Our usage is, I believe, fairly mundane. A dozen or so pipelines, with a reasonable amount of credentials managed by the bundled credhub. We have three teams (including
main
), authenticated via Github Oauth.Our setup failed this morning. Manifestations were
credhub-cli
rejecting logins with “bad credentials”, and git resource checks failing. The git resource was complaining about pgsql disk space usage; alas, I did not keep the exact error.Checking RDS, the Postgres disk had indeed filled up - all 10 gigs. I resized it to restore service, and tunnelled in to find database usage of:
The relation size in credhub was:
I’m not a credhub expert. Things I guess might be useful in diagnosing this:
select count (distinct uaa_url) from request_audit_record
gives 1; the record ishttps://an_ip:8443/oauth/token
select count(*) from request_audit_record;
gives17735751
18c85002-8fae-4d7e-9aa4-bad4610f9e43 | 127.0.0.1 | 1516213545469 | /api/v1/data | 127.0.0.1 | 1516210813 | 1516214413 | https://an_ip:8443/oauth/token | | | | credhub.write,credhub.read | client_credentials | atc_to_credhub | GET | 200 | path=<73 characters redacted> | uaa
select count(*) from event_audit_record
gives 17740039select operation, count(*) from event_audit_record group by operation;
givesb2b1aa8a-10e1-4777-b742-07df841918fb | 7d6d796e-d391-496f-90bd-253ed2cc55c0 | 1516111765973 | credential_update | <redacted 73 characters of credential path> | uaa-user:94d61c71-12e4-42ce-9d59-03292aa2c382 | t
Evidently, something about our setup is causing an unexpectedly large number of credhub uses (perhaps the constant git polling?). I will leave the tables intact for a few days in case they are useful for further diagnostics, but will have to truncate them sooner rather than later.
Let me know what I can do to help!
CC @jpluscplusm