Closed JeanMarie-PM closed 11 months ago
Comment in slack by @mogul
If GSA SOC allows us to ship logs to their SIEM-as-a-Service (I haven't reviewed the SSPP to see if we say we are doing this, but it's an OMB mandate that eventually all apps ship logs to an agency-centric endpoint) then we should also be deploying the logstack stuff I wrote which drops logs into an S3 endpoint that they ingest.
It looks like basic logging info is already available in cloud.gov. What requirements do we ned to address regarding logging for the MVP? @jadudm , @mogul
It looks like basic logging info is already available in cloud.gov. What requirements do we ned to address regarding logging for the MVP? @jadudm , @mogul
https://logs.fr.cloud.gov/ > Left Blade > Kibana > Discover
Note:
@mogul mind if I get your assistance on this item?
Let's review whether the New Relic agent is in fact picking up logs on its own or not. If not, then we should
For the latter point about shipping logs to an S3 bucket for consumption by the GSA SOCaaS: There's a bullet up here about that which hasn't been broken out yet. Let's consider that out of scope for this particular issue. (When we have time to take it on, that would likely work from this example, though I'd like to use a Terraform module to implement that.)
@asteel-gsa, @akf has just gotten cg-logshipper
into shape; you set it up to drain logs from Cloud Foundry, and it ships them to both New Relic and S3. Do you want to work on this issue sometime soon? If so we should probably meet and talk about what it would take to implement a Terraform module that deploys cg-logshipper
.
@mogul totally. So far our new relic implementation isn't there, pending resolution with NR support, but we can set it up whenever. Pending any backup/restore testing with JMM, or new relic suddenly working and digging into that task, should be free whenever to work on this with you.
@asteel-gsa , let me know when you want to consider this issue closed. The ticket has almost nothing up top.
Is acceptance "set up NR," or is it "set up NR and ship through cg-logshipper
?"
@asteel-gsa , let me know when you want to consider this issue closed. The ticket has almost nothing up top.
Is acceptance "set up NR," or is it "set up NR and ship through
cg-logshipper
?"
AC would be ship logs to NR via cg-logshipper
Let's leave this in backlog for now, we do want to do this, but only after we have confirmed all environments reporting to NR.
@mogul now that we have new relic configured properly, do you want to find some time to get this implemented?
Sure, how about late this week or Monday the week after next?
Works for me, we can aim for friday, ill put some time on the calendar
Alex and I spent a good chunk of time today to sketch out the details, and we've groomed the initial post accordingly. If anyone has questions or concerns about this approach, now's a good time to bring them up, before we break ground!
At a glance
In order to ensure logs appear in places where people can mine and alert on them as a FAC devops-oriented person I want cloud.gov app logs and metrics to be shipped to New Relic (for our own alerting purposes) and an S3 bucket (for the GSA SOC to ingest for GSA IT's alerting purposes).
Acceptance Criteria
We use DRY behavior-driven development wherever possible.
Scenario: Logs are flowing to New Relic
Given I am authenticated with New Relic when I review logs for the gsa-fac app ...
Scenario: Logs are flowing to the S3 bucket
Given I have a service-key for the FAC logs S3 instance when I look at the content of the S3 bucket ...
Shepherd
Background
cloud.gov doesn't offer alerting capabilities out of the box, so that's why we're going to ship logs off to New Relic, where we can set up alerts.
In addition, OMB directive M-21-31 says that agencies should stovepipe logs into a central agency-wide SOC. So that's why we're going to ship logs to an S3 bucket (that the GSA SOCaaS can pull from).
Security Considerations
Required per CM-4.
We are ensuring that the
cg-logshipper
app uses the egress proxy to communicate with New Relic, and the egress proxy requires client credentials. We're also ensuring that thecg-logshipper
app itself requires client credentials. Connections to brokered S3 buckets are already routed over a cloud.gov internal endpoint. In all hops (app to logshipper, logshipper to egress proxy, logshipper to New Relic, logshipper to S3) the traffic is secured with TLS.For our initial implementation the
cg-logshipper
app and S3 bucket will be in the same space as the apps whose logs it is draining. A team member acting as an insider threat could possibly tamper with the logshipper app or the bucket content using theirSpaceDeveloper
access. However, that's a remote concern. For our initial implementation we're considering that concern out of scope and we're noting mitigation of that concern as a "potential future enhancement" below. (Also note that the logs that go tologs.fr.cloud.gov
and New Relic are tamper-resistant and serve as a comparison point for the S3 content in case an insider threat is identified.)Sketch
We're thinking we'll write a Terraform module that deploys the cg-logshipper app, similar to the existing https-proxy module.
Since we're not all that familiar with the raw output from Cloud Foundry, it may be helpful to look at the cloud.gov ELK configuration to see how they're processing raw output from CF on its way into logs.fr.cloud.gov (where a bunch of fields are parsed out). Here are the ELK (old) Opensearch (new) versions of the logs.fr.cloud stack.
Potential future enhancements (other stories)
For machine identification: We want to have a concrete test that will sieve out lines specifically delivered by the logshipper to verify that everything is working, rather than having to check it's working as a human looking at the UI. In
logs.fr.cloud.gov
there's acf_origin:firehose
field; we are hoping we can implement something like that for the logshipper in New Relic.For moving the logshipper app and bucket to another space: This addresses a potential insider threat consideration, so they can't create service bindings and mess with the content of the S3 bucket; only admins (who have direct access to that other space) can do that.
Process checklist
# Sketch [comment]: # "Notes or a checklist reflecting our understanding of the selected approach" - [ ] Design designs all the things - [ ] Engineering engineers all the things # Definition of Done ## Triage ### If not likely to be important in the next quarter... - [x] Archived from the board ### Otherwise... - [x] Has a clear story statement - [x] Design or Engineering accepts that it belongs in their respective backlog ## Design Backlog - [-] Has clearly stated/testable acceptance criteria - [-] Meets the design Definition of Ready [citation needed] - [-] A design shepherd has been identified ## Design In Progress - [-] Meets the design Definition of Done [citation needed] ## Design Review Needed - [-] Necessary outside review/sign-off was provided ## Design Done - [-] Presented in a sprint review - [-] Includes screenshots or references to artifacts ### If no engineering is necessary - [-] Tagged with the sprint where it was finished - [-] Archived ## Engineering Backlog - [x] Has clearly stated/testable acceptance criteria - [x] Has a sketch or list of tasks - [x] Can reasonably be done in a few days (otherwise, split this up!) ## Engineering Available - [ ] There's capacity in the `In Progress` column - [ ] An engineering shepherd has been identified ## Engineering In Progress - [ ] Meets acceptance criteria - [ ] Meets [QASP conditions](https://derisking-guide.18f.gov/qasp/) ### If there's UI... - [ ] Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order - [ ] Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works. - [ ] Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate. ## Engineering Blocked - [ ] Blocker removed/resolved ## Engineering Review Needed - [ ] Outside review/sign-off was provided ## Engineering Done - [ ] Presented in a sprint review - [ ] Includes screenshots or references to artifacts - [ ] Tagged with the sprint where it was finished - [ ] Archived