Virus scanning of source code/released source/binaries

kurtseifried commented 6 years ago

I know it is generally very unlikely for an Open Source app to ship a virus (the code review really should catch it...) but virus scanning is pretty easy using free and OpenSource tools like ClamAV, or free tools like virustotal.com (which has a scanning API that could be integrated as part of the release CI build process for example).

Is there a specific reason that virus scanning (with e.g. ClamAV or virustotal.com) is not part of the best practices? If nothing else it will ensure any false positives are dealt with proactively (e.g. ClamAV used to have a double zip test that triggered a some false positives and panicked more than a few people responsible for the security of released software, myself included).

I would suggest something as simple as "Scan source code, and/or releases with anti virus (such as ClamAV) and consider leaving a statement as to when the source code/release was scanned, with what scanner, and with what version of the detection database".

david-a-wheeler commented 6 years ago

On December 1, 2017 11:19:34 PM EST, Kurt Seifried notifications@github.com wrote:

Welcome! Always glad to see you.

I know it is generally very unlikely for an Open Source app to ship a virus (the code review really should catch it...) but virus scanning is pretty easy using free and OpenSource tools like ClamAV, or free tools like virustotal.com (which has a scanning API that could be integrated as part of the release CI build process for example).

Is there a specific reason that virus scanning (with e.g. ClamAV or virustotal.com) is not part of the best practices?

We generally tried to capture what a number of well-run OSS projects were already doing, and I don't think there are many projects that do that. Do you know of a few? Which ones?

I'm not sure how much this would help:

If the project/lead is actually malicious (yikes), then they can easily avoid triggering their own tests. So AV can only deal with unintentionally inserting malicious code. I mention this possibility just for completeness.
malicious code written by the project is very unlikely to be detected by AV, since AV normally only detects already known malicious code / behaviour. As you noted, that should be detected by source code review.
that leaves malicious code not written by the project, but is distributed by the project. I think the primary cause of this would be subverted convenience binaries distributed by the project. That said, many OSS projects today do not distribute convenience binaries. In most cases I've seen where an OSS project distributes a subverted binaries, the subversion happened on the distribution (which are countered by digital signing and https, not build-time AV). Even if you ran a build-time AV, AV effectiveness has been decreasing due to more polymorphic viruses and more complex behaviour.

It's not a completely crazy idea. I just wonder how helpful/effective it would be. I agree that it would not be too hard to add ClamAV or similar to a test suite. If we added something like this, I think it should start at a higher level (silver or gold) and later discuss moving it down.

I'd love to hear others' thoughts.

--- David A.Wheeler

kurtseifried commented 6 years ago

We generally tried to capture what a number of well-run OSS projects were already doing, and I don't think there are many projects that do that. Do you know of a few? Which ones?

I know existing tools already do this, e.g. rpmgrill:

https://github.com/default-to-open/rpmgrill/blob/master/lib/RPM/Grill/Plugin/VirusCheck.pm

and I know many "commercial" open source companies scan stuff before release.

I'm not sure how much this would help:

If the project/lead is actually malicious (yikes), then they can easily avoid triggering their own tests. So AV can only deal with unintentionally inserting malicious code. I mention this possibility just for completeness.

So if we're dealing with an actively malicious coder then that's somewhat out of scope I would say (e.g. they can embed bitcoin miners into stuff to monetize it, is that malicious?). I would point out that using something like virustotal means you could check the signature with al ink, e.g.:

https://www.virustotal.com/#/file/d3c34d1d2ba9342284b522a628fe73c24d8c99acbdeb5eb450f068e25b08175e/detection

malicious code written by the project is very unlikely to be detected by AV, since AV normally only detects already known malicious code / behaviour. As you noted, that should be detected by source code review.

However it's also useful longer term to see how long a project was compromised for example (I'm betting if someone downloads github/rubygems/nodejs/etc and virus scans it they'll find at least one).

that leaves malicious code not written by the project, but is distributed by the project. I think the primary cause of this would be subverted convenience binaries distributed by the project. That said, many OSS projects today do not distribute convenience binaries. In most cases I've seen where an OSS project distributes a subverted binaries, the subversion happened on the distribution (which are countered by digital signing and https, not build-time AV). Even if you ran a build-time AV, AV effectiveness has been decreasing due to more polymorphic viruses and more complex behaviour.

It's not a completely crazy idea. I just wonder how helpful/effective it would be. I agree that it would not be too hard to add ClamAV or similar to a test suite. If we added something like this, I think it should start at a higher level (silver or gold) and later discuss moving it down.

One thing I would suggest: this isn't hard to do. Virustotal for example has an API, and integration with CI is relatively straight forwards, essentially the cost of doing this is quite low, there should be a pretty low rate of false positives for most projects, and there is the tangible benefit of not making end users deal with false positives (assuming you fix them), and the added assurance that the OpenSource is not infected with known stuff.

david-a-wheeler commented 6 years ago

I know existing tools already do this, e.g. rpmgrill: https://github.com/default-to-open/rpmgrill/blob/master/lib/RPM/Grill/Plugin/VirusCheck.pm and I know many "commercial" open source companies scan stuff before release.

Can you list a few project names (with evidence that they do it)? Does some distro (like Fedora) use rpmgrill with this plug-in? "It's already being done" would be a strong argument for this.

One thing I would suggest: this isn't hard to do. Virustotal for example has an API, and integration with CI is relatively straight forwards, essentially the cost of doing this is quite low, there should be a pretty low rate of false positives for most projects...

Virustotal is definitely an easy way to do a lot, and that definitely helps make this easier to do. I do wonder about the Virustotal terms of service, which say:

When you upload or otherwise submit content, you give VirusTotal (and those we work with) a worldwide, royalty free, irrevocable and transferable licence to use, edit, host, store, reproduce, modify, create derivative works, communicate, publish, publicly perform, publicly display and distribute such content.

That's fine for something under MIT or BSD-3-Clause licenses, but a project that uses licenses like Apache, LGPL, or GPL might hesitate to agree to that.

That said, they could always use another tool like ClamAV.

and there is the tangible benefit of not making end users deal with false positives (assuming you fix them), and the added assurance that the OpenSource is not infected with known stuff.

The "not infected with known stuff" is what I meant earlier (I agree with you there). The "not making end users deal with false positives" is a good point, that's a reasonable additional reason.

So: Can you identify a few specific projects that are already doing this, with some evidence (URLs)? I think that's key to the next step (working out draft language to possibly add). If we added this, I think it should start at the higher levels (probably silver, since it's not hard to do).... we can then move the criterion down over time.

david-a-wheeler commented 6 years ago

If we do add this, we'll need to put in a few caveats:

Clearly this only applies if the project distributes compiled results (including "convenience binaries" and JAR files).
It should only apply if there are FLOSS tools that can implement the criterion (similar to other tool-based criteria)
We'll need to make some sort of exception for intentionally-malicious code. Some FLOSS projects include test files of malicious code (e.g., AV tools need malicious code for their test suites), and a few FLOSS projects are specifically to create malicious code for use as tests (used one in my PhD dissertation). Such projects will need to clearly document why they have this malicious code in such a way that users won't accidentally run the malicious code, and where possible should be written to be harmless in practices.

If we do this, I expect that we'd use our usual approach. That is, we'd first implement this as a "future" criterion (which lets people enter data but doesn't count for the badge achievement), presumably at the "silver" level.

david-a-wheeler commented 6 years ago

FYI, there's a new paper "Malware Detection by Eating a Whole EXE" which does static analysis for malware detection from raw byte sequences using machine learning (specifically, a neural network). I don't know if there's an OSS implementation; I've emailed the authors.

david-a-wheeler commented 6 years ago

The example you gave:

https://www.virustotal.com/#/file/d3c34d1d2ba9342284b522a628fe73c24d8c99acbdeb5eb450f068e25b08175e/detection

is for Audio Hijack, a proprietary program by Rogue Amoeba for Macs.

Can you point to projects/distros/repos that do this for OSS?

kurtseifried commented 6 years ago

So Fedora uses taskotron with rpmgrill to scan for viruses, e.g.:

https://taskotron.fedoraproject.org/artifacts/all/562d1962-d7b3-11e7-986b-525400817a8f/task_output/rpmgrill.json

as for the example I gavce that simple happened to be the last binary I've scanned with virustotal (so first link in my web browser history =).

As for the license thing, the problem is that Google has to take a copy of that file, and most likely stores it on their end for some time (e.g. if you rescan something it uploads the hash first and says "oh I already saw this" and you don't need to re-upload). I assume they also hand interesting files off to AV people to generate signatures/etc. So yes the license thing is definitely a concern but hopefully the "don't be evil" thing keeps going. Also please keep in mind anyone can submit this stuff to google (and no I'm not trying to make the "well if we don't murder these kittens someone else will" argument, I'm just making sure it's pointed out =).

kurtseifried commented 6 years ago

It should only apply if there are FLOSS tools that can implement the criterion (similar to other tool-based criteria)

I know Fedora/Red Hat have complete tool chains for this and they are of course OpenSource, the main thing is we have a fully open source AV which is the main thing needed.

david-a-wheeler commented 6 years ago

Pointing to Fedora definitely helps the case. I will try to craft some text based on what you said and also try to make the text more like the existing criteria. That is a necessary step no matter what, and it's a lot easier to argue something if you know exactly what it is.

There is a potential counter argument, namely, that Fedora is a distribution and not an individual project. The best practices criteria are really intended for specific projects. Can you identify any specific open source software projects that already do this scanning on their own executables? That would create a stronger argument.

david-a-wheeler commented 6 years ago

It is not clear to me if the virustotal license is a real problem or not. If you squint hard, I suspect You could argue that all open source offer meets those requirements since they can be modified and redistributed. On the other hand, most licenses have additional requirements such as attribution. I have no interest in being the world's lawyer, especially since I am not a lawyer to start with. I think we can simply reference virustotal and mention its licensing terms, and then let people make up their own minds.

kurtseifried commented 6 years ago

There is a potential counter argument, namely, that Fedora is a distribution and not an individual project. The best practices criteria are really intended for specific projects. Can you identify any specific open source software projects that already do this scanning on their own executables? That would create a stronger argument.

Not offhand, searching google yields a lot of "how do I scan for viruses" and not much "we are scanning for viruses". There are definitely people asking for this type of integration with CI, e.g.:

https://github.com/travis-ci/travis-ci/issues/8031

And Jenkins has a ClamAV plugin already::

https://wiki.jenkins.io/display/JENKINS/ClamAV+Plugin

So I assume there is demand because 1)I see Fedora/etc doing it and 2) people are writing plugins for CI/asking for them.

david-a-wheeler commented 6 years ago

I haven't found much in the way of specific OSS projects that do virus scanning of their software before they send it out either. Here are some things I note:

CCleaner's subversion is clearly an example where something malicious was inserted by an attacker, but it's not clear that virus scanning at the supplier would have helped.
Ion Channel does virus checking (among other things) on open source software, but that's at the injest side, not the supply side.

We can work out some criteria text, especially since it's not that hard to do & we can at least make the argument that Fedora does it. But it'd be a lot stronger argument if we could identify some OSS projects (preferably well-known ones) that are already doing this.

dankohn commented 6 years ago

I would suggest that the lack of examples brings into question whether virus checking represents a best practice or not. I don't believe the purpose badge is to be prescriptive on policies that are not already in use by the best run projects.

-- Dan Kohn dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Fri, Dec 8, 2017 at 6:42 PM, David A. Wheeler notifications@github.com wrote:

I haven't found much in the way of specific OSS projects that do virus scanning of their software before they send it out either. Here are some things I note:

CCleaner's subversion https://www.wired.com/story/ccleaner-malware-supply-chain-software-security/ is clearly an example where something malicious was inserted by an attacker, but it's not clear that virus scanning at the supplier would have helped.

Ion Channel does virus checking (among other things) on open source software http://www.ionchannel.io/technology/, but that's at the injest side, not the supply side.

We can work out some criteria text, especially since it's not that hard to do & we can at least make the argument that Fedora does it. But it'd be a lot stronger argument if we could identify some OSS projects (preferably well-known ones) that are already doing this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/coreinfrastructure/best-practices-badge/issues/987#issuecomment-350400158, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBilhWDs_CQ7PuwbpBGtZ_CBH1Favks5s-cldgaJpZM4QzKmj .

david-a-wheeler commented 6 years ago

I would suggest that the lack of examples brings into question whether virus checking represents a best practice or not.

Yes, that is exactly my point. I don't mind entertaining the discussion while we try to track that down, but our goal is to capture what has already been proven to be a good idea, not to guess what unproven ideas might happen to work

david-a-wheeler commented 6 years ago

I do like the idea of using this as a test so that end-users are less likely to have false positives. That's probably not enough, unless we can show that there are a number of projects that do do this.

Running an AV does seem more like a demand-side (ingest) activity, not a supply-side (production) activity, in support of supply chain risk management (SCRM). I don't see anything specific to OSS that covers this. OpenChain doesn't seem to cover this (it's more focused on licenses). Maybe OpenChain would be interested?

david-a-wheeler commented 6 years ago

The Cyber Risk Predictive Analytics Project is work that quantitatively analyzed various practices.

64% of the respondants said "Yes" to this:

Do you quarantine code from outside suppliers in proxy servers to undergo virus scanning and authentication procedures?

But again, this is demand-sie (ingest/ingress) side, not the OSS project side.

david-a-wheeler commented 6 years ago

A few interesting points:

We have discussed something like this before, see issue #688
SourceForge now scans all projects for malware. SourceForge isn't as popular as it once was, but I guess it could be argued as an example of a case where scanning is happening at the supply side (instead of the ingest side).

I'm still a little skeptical, but I figure we'll slowly capture information and make decisions later.

coreinfrastructure / best-practices-badge

Virus scanning of source code/released source/binaries #987