Suggestion/Recommendation for CVE Announcement enhancements

zparnold commented 5 years ago

Hello there! I help lead the sig-docs-security working group in the Kubernetes project. We focus on surfacing and formatting security related info as well as helping to amplify the CVE management process. During our meetings a proposal has arisen to handle the CVE process in a more proactive way for our cluster operators. I asked this as a question during KubeCon EU's session on sig-security in Barcelona and was asked to open an issue here. The proposal is as follows. We would like to provide a very simple API for cluster operators to register to receive alerts based on what version of a CNCF project they are using:

POST /v1/register

{
  "project": "kubernetes",
  "version": "1.10.13",
  "alert_group": [
    {
      "method": "phone",
      "number":"+1123567890"
    },
    {
      "method": "email",
      "number":"me@email.com"
    },
    {
      "method": "http",
      "number":"https://someurl.com/url"
    }
  ]
}

This API would then return (if successful validation) a unique token (which has a low collision rate and high search space):

{
  "token":"kajshfdasjhdviurvubilrsbvaiurhvliauwhrvliuahrvliaurhiurhv"
}

This token can then be used to deregister the user who signed up:

POST /v1/deregister

{
  "token":"kajshfdasjhdviurvubilrsbvaiurhvliauwhrvliuahrvliaurhiurhv"
}

Which would then deregister the user from updates if the operation was a success.

Upon a verified CVE being published, this system could be used to trigger an alert to only those affected by the specific project/version combination. This allows for 100% signal to noise ratio as far as cluster operation and CVE management are concerned.

Pros:

It would be easy to construct
It helps to give our downstream consumers a sense of ease because we are there to get them the information they need just in time to make infrastructure decisions
This improves upon RSS and allows us to push information to operators much quicker than before
Since the IPv4 address space can now be probed in under 1 hour on commodity hardware, (https://zmap.io/) this gets information to operators as quickly as is feasibly possible

Cons:

A human will have to maintain this and be on call 24/7 for CVE announcements to operate this system
"Who watches the watchman" syndrome can occur and alerts could be dropped
This system itself could be subject to malicious use since attackers could have the same information that cluster operators do in the same time period (security through obscurity)

Thoughts?

sftim commented 5 years ago

I'd be delighted to learn that there's already a good de facto standard for doing this, and that Kubernetes plans to follow that.

How about an ATOM / RSS feed where you can register a webhook for notifications, with a proof-of-work / CAPTCHA required to validate the registration?

Then, end users can strap anything they like onto the end of the webhook, and they can poll in case the webhook didn't arrive.

(extra credit: send the webhook in CloudEvent format, maybe with a digital signature).

jimangel commented 5 years ago

I really like this idea. I would like to see more outlined for how the actual app / FaaS would be managed long term in an open way.

As far as the data being collected / token exchange, where would that data be stored? How do we ensure privacy?

Lastly, it would be great to have some way to easily update your Kubernetes version (in the event you patch / major upgrade). I could also see it being beneficial to receive an event "Hey, you haven't updated your version in X months/years, do you want to update or delete this alert?"

sftim commented 5 years ago

@jimangel it sounds like you think it'd be useful for the same ATOM / RSS feed to publish all releases, whether relevant to a CVE ID or not. (?)

zparnold commented 5 years ago

I could also see it being beneficial to receive an event "Hey, you haven't updated your version in X months/years, do you want to update or delete this alert?"

I agree but it might be slightly out of scope, cause then it kinda becomes a marketing tool.

As far as how it will be managed, the actual components I'm envisioning are two fold.

The API is just a simple API deployed to any cloud provider (I would choose AWS cause I have the most experience there) and the info would be stored in a GDPR compliant (even though its anonymous) DynamoDB instance encrypted ten ways till Sunday. (Meaning that the table is encrypted at rest, and the data itself are(is?) encrypted via Amazon KMS...and all comms are via TLS.) That information is then used to subscribe uses to an AWS SNS topic that is specific to the version of the service they care about.
A worker function is crawling these RSS feeds every 60 or so seconds (or we work in conjunction with the core security teams of these projects.) And when something that is deemed actionable (that determination can be made later) someone(s) are paged via OpsGenie (or other provider.) They then make the final determination if a group needs to receive a message. When they do, they manually (from the AWS console) issue an alert to the appropriate topic.

Does that make sense?

zparnold commented 5 years ago

@sftim The reasons I'm not advocating for an RSS approach is because:

1) RSS is fundamentally a pull model which offers cluster operators or OSS project users no additional benefit from any previous system 2) The Kubernetes CVE team pushes out releases to the Kubernetes Announce mailing list, which also has non-CVE related info...only adding to the problem where we must filter noise to distill signal (CVE's) from it.

My thought process is that we could open a new mailing list for Kubernetes if we needed to go the RSS/ATOM route, but I was hoping not to.

sftim commented 5 years ago

What I'm suggesting is RSS-plus: a feed combined with a promise that you'll get a webhook when it changes.

The recipient of the webhook can verify the information by fetching the RSS / ATOM / whatever, if they want to.

Don't fully trust the webhook to arrive? You can still poll the feed.
Kubernetes project is worried about abuse? A proof-of-work / proof-of-pulse check before your webhook subscription gets approved.
Don't like CloudEvent payloads? The subscriber can strap something onto the receiving end of the webhook to transmute it into what they do want: SMS message, buzzer, Prometheus scrape, carrier pigeon release, etc

sftim commented 5 years ago

Toy implementation: run https://github.com/skx/rss2hook on CNCF infrastructure with appropriate configuration. Put the configuration in a git repo, take pull requests, link that to the app with a ConfigMap.

zparnold commented 5 years ago

@sftim This makes more sense. Thanks for this by the way! I suppose I don't understand CloudEvents well, is it possible for us to manage the effort around alerting some common forms? (PagerDuty, OpsGenie, SMS, Phone Call, Email?)

I guess I'm aiming to have the CNCF/Us be responsible for alerting people to the threat in a few commonly supported formats. So I love that they can still poll the RSS endpoint and I'm on board with that, but I don't understand the next step you're talking about. (PoW or CE Payloads..)

sftim commented 5 years ago

My concern: does CNCF want to promise alerting by SMS & phone call for cluster operators worldwide? The foundation aims to be inclusive; offering a service only in some countries might look like the opposite. Offering the service globally could be inclusive but costly - I've seen this with international phone calls as part of multifactor authentication.

Offering a webhook and nothing else is a lowest common denominator: it's safe to assume that cluster operators have internet and fair to assume that they have something that can run a web server.

PS. The notification payload it doesn't have to be CloudEvent, any JSON format would be fine. CloudEvent does seem like a nice standard though.

zparnold commented 5 years ago

@sftim It's sadly not necessarily safe to assume all clusters have access to the internet (I know of at least one really big cluster that is completely airgapped,) I assume the cluster operator would have access to the internet, but then we're adding additional infrastructure for them to run.

As for CloudEvents I'm all for standards, so sure!

lizrice commented 5 years ago

I like the broad idea of this. cc @caniszczyk

caniszczyk commented 5 years ago

FYI GitHub now has a security advisory feature in beta you may want to take advantage of, happy to enable it for any CNCF project if you don't have it: https://help.github.com/en/articles/creating-a-maintainer-security-advisory

stale[bot] commented 4 years ago

This issue has been automatically marked as inactive because it has not had recent activity.

lumjjb commented 3 years ago

@PushkarJ do you think that this is of relevance still today? Any comments on this from k8s security perspective? If not we will probably close due to scope.

PushkarJ commented 3 years ago

@lumjjb Agree with you that the scope is too wide at the moment. We are experimenting with some ideas on triage, vulnerability resolution and transparency in kubernetes sig-security. Perhaps once we have a working model and process, we could do a presentation about our approach in CNCF TAG-Security meeting and then explore how this can be adopted across all the CNCF projects (maybe as graduation criteria) ;-)

So, with that said, we can close this and when the time is right, I will open a new issue and link it to this one at that time for completeness. Hope that works for all!!

stale[bot] commented 2 years ago

This issue has been automatically marked as inactive because it has not had recent activity.

lumjjb commented 2 years ago

Closing this issue for now, linked to tracking k8s sig-security issue and will re-open when ready to present to the TAG.

cncf / tag-security

Suggestion/Recommendation for CVE Announcement enhancements #170