CRAVEX: Collect and normalize exploit pointers

aboutcode-org / vulnerablecode

A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. Sponsored by NLnet https://nlnet.nl/project/vulnerabilitydatabase/ for https://www.aboutcode.org/ Chat at https://gitter.im/aboutcode-org/vulnerablecode Docs at https://vulnerablecode.readthedocs.org/

https://public.vulnerablecode.io

Apache License 2.0

543 stars 201 forks source link

CRAVEX: Collect and normalize exploit pointers #95

Open pombredanne opened 5 years ago

pombredanne commented 5 years ago

We want to collect data about exploits.

See discussion document at https://docs.google.com/document/d/1XtMmxthmANhr-IqXsyMgFnrOq5fTGfsE/edit?usp=sharing&ouid=117241222429542576816&rtpof=true&sd=true

See work-in-progress normalized model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing

armijnhemel commented 1 year ago

exploitdb has moved to https://gitlab.com/exploit-database/exploitdb

pombredanne commented 1 year ago

This is a nice dataset:

https://github.com/nomi-sec/NVD-Exploit-List-Ja the license is TBD though https://github.com/nomi-sec/NVD-Exploit-List-Ja/issues/1

Also:

https://github.com/Patrowl/PatrowlHearsData is Apache-licensed
https://github.com/CERTCC/labyrinth/ by @ahouseholder now has a license and already aggregates the above.

pombredanne commented 1 year ago

And also https://github.com/nomi-sec/PoC-in-GitHub

ahouseholder commented 1 year ago

We're tagging vul IDs (more than just CVE) at

We pull updates from their respective repositories every few hours, crawl the diffs for IDs we recognize, and then tag the commit in which the ID first appeared.

The ID patterns we look for are here: https://github.com/CERTCC/git_vul_driller/blob/dd49cec61aac5ee9e84d57313a7876145e0b1522/git_vul_driller/patterns.py#L15-L56

ahouseholder commented 1 year ago

https://github.com/CERTCC/labyrinth/ by @ahouseholder now has a license and already aggregates the above.

Just to set expectations on data quality: Please be aware of the notes about signal-to-noise in the Labyrinth README. An ID that shows up in Labyrinth might be because there's an exploit repo that mentions it, or it could be a number of other relatively benign reasons because our code isn't smart enough to tell the difference. Labyrinth's findings are meant to serve as input to an analysis process, not a production exploit feed.

ahouseholder commented 1 year ago

On the other hand, we're more confident about the exploitdb/metasploit tags indicating exploits because there's a human vetting process involved (i.e., their developers decide what to include in their product).

pombredanne commented 1 year ago

@ahouseholder Thank you for the valuable insights. In the end, I want to know if my code is vulnerable. So the idea here with exploits is this, inc combination with reachability:

Using scancode to detect packages (PURL) or any of the many other tools that use PURLs and a lookup in VulnerableCode or in any other vulnerability DB that uses PURL I know I am potentially vulnerable
The I would like to automate as much as possible things to find out if my usage of the vulnerable package is exploitable. For this there are two tracks that I think can help: 2.1. Static reachability: given knowledge of a fix commit and a static analysis of the vulnerable library interaction with my code, do I use any of the vulnerable code paths? 2.2. Dynamic exploitability: given an exploit script (eventually curated to conform to a common interface and setup), is my configuration exploitable?

And if I am either exploitable or the vulnerable code is reachable, then I need to patch (possibly with the fix commit)

armijnhemel commented 1 year ago

DennisClark commented 3 months ago

See discussion document at https://docs.google.com/document/d/1XtMmxthmANhr-IqXsyMgFnrOq5fTGfsE/edit?usp=sharing&ouid=117241222429542576816&rtpof=true&sd=true

See work-in-progress normalized model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing

DennisClark commented 3 months ago

The proposed normalized Exploits model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing is ready for review.

DennisClark commented 3 months ago

The proposed normalized Exploits model spreadsheet at https://docs.google.com/spreadsheets/d/1J2t2T_s015pnAouy5ss-AA0SI4e2xjT4uICjlL_Aa38/edit?usp=sharing has been reviewed and ready for implementation.

ziadhany commented 2 months ago

@DennisClark I’ve been working on the Exploit model but encountered a few challenges:

The resources_and_notes in Metasploit is a set of options like this:

"notes": {
"AKA": ["Zerologon"],
"Stability": ["crash-safe"],
"Reliability": [],
"SideEffects": ["config-changes", "ioc-in-logs"]
}

should we store it as a JSON or just a string?

I'm also unsure about what should be stored in source_url for Metasploit, as references can include multiple URLs. Any guidance on this?