Open RaitoBezarius opened 1 year ago
As for general architecture, I think the local scanning tool should be as independent of the online security tracker as possible: for example, I think it should fetch its own copy of the vulnerability databases and match against that, instead of making that the responsibility of the online tracker. This will make it easier to evolve both independently, though of course they may share approach, ideas and even code. Also it will avoid making the online tracker 'critical infrastructure' and putting it under load.
The local tracker can be split into a number of parts:
My expectation is that matching derivations against vulnerabilities will be our biggest challenge. Long-term we should likely take into account src
attributes, applied patches, vendored depedencies, etc etc, but for now I think we should focus on the pname, the mappings collected at repology (they also have cpe bindings and 'manual' matching. Definitely let's look into what other tools like vulnxscan, trivy, syft/grype etc etc do here, but I expect we'll be largely 'on our own' here.
My expectation is that matching derivations against vulnerabilities will be our biggest challenge. Long-term we should likely take into account src attributes, applied patches, vendored depedencies, etc etc, but for now I think we should focus on the pname, the mappings collected at repology (they also have cpe bindings and 'manual' matching. Definitely let's look into what other tools like vulnxscan, trivy, syft/grype etc etc do here, but I expect we'll be largely 'on our own' here.
I disagree with the approach because this is the best way to build a tool that won't be used because it generates too many false positives. Relying on the online tracker as critical infrastructure is fine, as you noted, we have plenty of vulnerability scanners, I am not aware of them being useful in large contexts. The reason for this, IMHO, is that it's not using the information generated by the Nixpkgs community, including the security team, etc.
To make this information available, someone has to massage it, centralize it, etc. This is what the security tracker is about. Avoiding the security tracker dependency is just rebuilding an N-th local vulnerability scanner, which attempts to do better at reconciling derivations with vulnerabilities. In what you are saying, I don't see why this is tractable — something has to create the information, you can increase accuracy by piling up on heuristics, but this can buy you only so much.
Of course, don't let this prevent the local software to share functionalities with the security tracker, but I don't believe we should circumvent the whole concept of crowdsourcing the information.
I agree the matching will be our most important distinguishing feature from the many existing scanners, and I also agree that while we can already make serious improvements over the status quo by "piling up on heuristics" (and perhaps including the repology knowledge), it will be essential to include 'manually' collected knowledge from the security team and the wider crowd. The online tracker is indeed the logical place to collect that. I think it would work best if this information is published by the online tracker as a 'feed' that the local tool would consume: the data is essential, but the 'work'/logic of doing the actual matching could then be done locally. That should also sidestep some of the load/privacy concerns you mentioned in your initial post.
Agreed but this doesn't sidestep the privacy leakage as the access patterns will probably leak a lot of information in the start.
Le lun. 13 nov. 2023, 09:07, Arnout Engelen @.***> a écrit :
I agree the matching will be our most important distinguishing feature from the many existing scanners, and I also agree that while we can already make serious improvements over the status quo by "piling up on heuristics" (and perhaps including the repology knowledge), it will be essential to include 'manually' collected knowledge from the security team and the wider crowd. The online tracker is indeed the logical place to collect that. I think it would work best if this information is published by the online tracker as a 'feed' that the local tool would consume: the data is essential, but the 'work'/logic of doing the actual matching could then be done locally. That should also sidestep some of the load/privacy concerns you mentioned in your initial post.
— Reply to this email directly, view it on GitHub https://github.com/Tom-Hubrecht/nix-security-tracker/issues/15#issuecomment-1807638598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACMZRFFNNVPGP4MWMJ6ABLYEHILRAVCNFSM6AAAAAA66WDHT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBXGYZTQNJZHA . You are receiving this because you authored the thread.Message ID: @.***>
I'm strongly with @raboof on having a focus on local datasets
I've mentioned this on matrix but existing vulnerability scanners are waiting for at minimum a list of cases where packages are fixed
https://github.com/aquasecurity/trivy/issues/1673#issuecomment-1234273374
The next useful part (which other entities could do but would be useful) is a mapping between our packages and a common identifier such as PURL or CPE
In the case of CVE-2023-45757
the NVD uses cpe:2.3:a:apache:brpc:*:*:*:*:*:*:*:*
(Up to and excluding 1.6.1) to identify what it's talking about, the OSV people point the CVE at the repo itself https://github.com/apache/brpc
and points to commit 14a18bcac6fca60fa8e6e4b3d222d6209de947c3
where it was fixed, for mapping to the packages in repos: we call it apache-httpd
, ubuntu calls it apache2
, etc
Many people will also be stuck with a certain scanner. Sonarqube, Aqua Security, Prisma Cloud/Twistlock, Snyk, various osqueryd vendors, the list goes on.
Yes having another independent open source scanner is cool and all but there's already general purpose open scanners such as trivy, grype, clamav, clair. The important part is providing the data that can be consumed by the tooling end users actually have to use.
If the dataset is so unique or depends on submitting individual queries to APIs other tools wont use it and the vast majority of users IMO will exist in the exact same position where they can't get approval to use nix for commercial projects. And if companies did end up hammering these APIs rather than consuming a dataset I could see that getting expensive.
Sure but I have to warn here, this is not a task to build solutions for commercial entities. If commercial entities are stuck with inappropriate security scanners, I sympathize but they can sponsor the work to fix this.
This is about proper security for our community.
Right, I'm not saying we should be going out of our way to help commercial entities, I just want to make sure we don't make security scanning impossible for anything other than very bespoke tooling. We can also piggyback off existing Open Source solutions such as Trivy/Grype, with grype being the most nix friendly so far. If the focus is on 1 then the data is useful for everyone, item 2 is more focused on being helpful for other tooling but also helpful for anyone and again it could be ignored.
I'm sure we can do additional cool stuff that's very nix specific but it's important to solve the baseline usecase of: I have package, I think it matches this CVE, Is that patched?
We can also piggyback off existing Open Source solutions
Believe me, writing custom code is generally not my first choice, and we will piggyback as much as possible where it helps :). However, in this case I think connecting with tools such as Trivy/Grype will hold us back more than it would speed us up. You correctly identify a number of the large tasks ahead of us, and you observe those tools are not particularly good at those right now. I think having a bespoke tool will allow us to iterate faster to learn what is needed to take on these challenges. Once we have a working setup, I would be delighted if we could generalize it and connect it to the various existing tooling available in the ecosystem. If that eventually means the bespoke part is no longer needed I would be delighted - but starting with that would only slow us down.
it's important to solve the baseline usecase of: I have package, I think it matches this CVE, Is that patched?
Exactly - the goal isn't to do 'additional cool nix-specific stuff', but to do 'the minimum amount of nix-specific stuff to get to the goal'.
I'm sure we can do additional cool stuff that's very nix specific but it's important to solve the baseline usecase of: I have package, I think it matches this CVE, Is that patched?
Yep, but in general, what you can only do is look at the patches and that's it. Everything else would have to use a structured database that understand nixpkgs Git history, and, then we go back to this platform.
(personal notes ; will be expanded upon later.) In the event, someone will write a daemon or a tool to scan (continuously) NixOS closures for security vulnerability, it would be interesting to coordinate via the security tracker itself to offer a way to tap inside its database.
It is critical for this to enrich the security tracker data to know about things like this security issue is related to this set of packages and this set of package is related to those
.drv
or store paths and make the metadata clear about it.So a tool could hypothetically send a compressed set of store paths composing the system and have a response about what are the potential vulnerabilities affecting such a system.
A lot has to be figured out on:
Obviously, things like "you overrode your package" so the store path cannot be recognized because you are compiling it yourself are not-avoidable and out of scope for this project.