Create a security program; make sure it eliminates bottlenecks

paulmelnikow commented 5 years ago

From https://github.com/badges/shields/pull/2573#issuecomment-449486184:

Maybe we should even store our own internal documentation for shields.io separate from documentation that is applicable to contributors or self-hosting users.

I recently created a private repo for the secrets needed to deploy the server. I paused as I was filling it out though, because I realized we may want to make some careful decisions about what should be shared with whom.

This project is widely trusted by the community. We should take precautious to ensure our users' continued security, and maintain our reputation for privacy.

However, depending on one person is bad. Transparency is good. Trusting people is necessary. Security cannot be a roadblock to progress.

We are currently using branch protection and required checks. However, permissions have been ad hoc.

It would be good if 2+ active maintainers had access to the key areas of the project (DNS, CDN, servers, monitoring, github org, npmjs, twitter, discord, dev infrastructure).

In the interest of driving things forward on good footing, I asked some open-source maintainers about how their projects handle volunteer-run, user-facing infrastructure:

Do you enforce required code approvals before merge?
Do you have policies around security and privacy?
How do you ensure no one person becomes a bottleneck?
How do you ensure an appropriate amount of transparency?

I got advice about some aspects of this:

Use PaaS which doesn't require SSH, provides permissions based on teams and orgs, provides activity logging transparency, and avoids access + knowledge bottlenecks
Use 12-factor, so there are no checked-in sensitive files; environment-specific information is only in the environment
Use branch protection, reducing the chance of something shady getting merged or deployed
Use required CI checks
Use a bot which creates issues from log events to avoid bottleneck

However, I didn't get any information on training, or deciding who gets access to what.

I'd like to suggest we take the initiative to design a program that addresses the most important aspects of security.

We should:

Decide how we want to handle these things
Write down our decisions
Get everyone to approve it, and
Implement it.

Ideally rather than reinvent the wheel, we should find someone else who has already has such a program, and adapt it to our needs.

Are folks on board?

chris48s commented 5 years ago

I recently created a private repo for the secrets needed to deploy the server.

This isn't necessarily a bad thing to do but if we're going to do this, we should encrypt what's in it. Even if its a private repo, its still storing plain text on GitHub so the secrets could be viewed as clear text by (for example) GitHub employees.

Beyond that, happy to have a think about these issues. I don't have definite answers to all the questions you're posing at this stage though.

calebcartwright commented 5 years ago

I know I'm the new one here and have zero info around the historical context, current deployment process, etc. :smile:

However, for the secrets piece: I'd be curious to know how feasible it would be to maintain those secrets in a secure token store, where they can be pulled in and dynamically injected into the runtime environment as part of the deployment process (assuming that there already is an automated deployment pipeline, or that one could be implemented)?

chris48s commented 5 years ago

:+1: I usually use ansible vault to encrypt secrets and decrypt at deploy time, but other options are available. Some other choice may be better here. Whatever tech choice we make, there is still a social choice to be made about the permission model for it. i.e: what is the group of people who can read from/write to the encrypted store.

calebcartwright commented 5 years ago

Definitely!

At my day job I work in a highly regulated environment so we often deal with a lot of the same questions/considerations (especially when deploying a new solution).

This may be overkill here, but one of the things I like to do in those situations is create some models that describe the various components/capabilities that are needed, sequence flows for various scenarios (like deployment process), etc.

We design these models using a functional/capability driven perspective, i.e. listing the function/capability (the what) like cache, monitoring, token/secret store, etc. We then do an overlay on top of those models covering who should have access to what on those respective components/actors, etc.

That then gives us a visualized, common understanding of the ecosystem along with who has access to do what and where, independent of tools/solutions (because given enough time tools/solutions will eventually change). Finally, we'll add another overlay of the various solutions (the how) like cache (Redis), secret/token store (Vault), etc. that we plan on using.

We often find that the permissions/access overlay plays a role in the selection of the tool/solution (for example does a tool/solution we're considering provide the necessary level of granularity around permissions/access management that we determined we needed)

Unsurprisingly, access to hosts/runtime environments, secrets/tokents/certs/etc. used in production is almost always extremely locked down, but we always strive to have a process that is fully automated, even if that often includes a manual approval being provided by one or more super admins prior to changes being applied (automatically) in the production envs.

Food for thought!

paulmelnikow commented 5 years ago

Just a note that we've as of #3631 PNG generation is being moved into a microservice which is deployed separately and hosted on Zeit Now. I opened #3647 for a version of this issue scoped to the microservice. Let's be sure to keep these issues in mind.

chris48s commented 4 years ago

Now feels like a timely point to revisit this

chris48s commented 3 years ago

This was quite broad issue which touched on several topics. I think over the last year or two we have now done a pretty good job of addressing a lot of the issues raised in the top post:

We've established an ops team
We've got communication channels for maintainers to discuss any issues which aren't suitable for a public forum
We have regular(-ish) meetings
Pretty much every essential service we rely on has multiple people on the core team who can access it and know how it works
Shields.io is now served from a PaaS - no SSH/servers/OS patches needed
Everyone is using 2FA on everything
We have established a Security Policy: https://github.com/badges/shields/security/policy
We have a security@shields.io email address where people can contact us
We use branch protection and required approvals

Personally I think we are in a pretty good place now. Do we think there are still outstanding tasks here, or shall we close this?

paulmelnikow commented 3 years ago

I agree. We are in way better shape now!

A couple important things feel outstanding:

While the ops team can manage the DNS records for shields.io on Cloudflare, none of us have access to the domain registration. Ideally we'd have a legal structure in place, like an association, which could legally own the domain, however in the absence of that, I think it should be registered to Shields.io, (maybe at my business address in Brooklyn, where I can receive mail for Shields.io), with access in the hands of two active members of the ops team.
The security policy we have covers the Shields.io code. I think it would be a good idea to write a security policy for the production systems. This would enumerates our practices and policies in relation to who has access and how we are supposed to use it. Important, but somewhat lower priority than the domain name.

badges / shields

Create a security program; make sure it eliminates bottlenecks #2577