oscap performance questions

yavor-atanasov commented 6 years ago

Hey guys,

First of all, apologies for raising this as an issue, but could not find relevant documentation on the topic. We are evaluating openscap to monitor a pretty wide fleet of services across our organisation. One of the things we are looking at is how we run it:

whether run the daemon on each running instance pumping data out to a common persistence layer
or, as each service is deployed from a "golden image", run it offline against the set of current golden images.

First tests running this show significant performance hit (for minutes) for the instance it's running on:

$ wget https://www.redhat.com/security/data/oval/com.redhat.rhsa-all.xml
$ oscap oval eval --results results.xml --report report.html com.redhat.rhsa-all.xml

Which makes us believe that the first approach is a no go.

So I guess my questions are (you can also point me to documentation I might have missed):

What is the expected performance of an oscap scan?
Specifically for oval checks - what is the tool actually doing (is it running checks against the local rpm database, or also scanning individual files on the system)
is there a documented recommended pattern/approach for setting up oscap as a continous monitoring solution.

Thank you for your help Yavor

matejak commented 6 years ago

Hello, thank you for using openscap! Your nature of the issue is quite broad, so please give us some time to gather all the needed knowledge. We will get back to you as soon as we are ready.

yavor-atanasov commented 6 years ago

Thank you @matejak

Just to clear up some obvious questions you might have: The machine I tested it on wasn't unusual in any way - I used an idle VirtualBox CentOS7 machine with one dedicated CPU, 2GB of memory and the amount of rpms installed on the machine was ~600. (the host machine uses SSD and i5-5287U CPU @ 2.90GHz) That's what top shows (the scan completes in ~3mins, the memory gravitates around 15-16% and cpu 90-99%):

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 4085 develop+  20   0  448720 333188   4252 S 99.9 16.3   3:10.44 oscap

With that performance profile it is unfeasible to run it on production AWS instances, especially smaller instances types. This is by no means criticism, we just would like to understand it better to pick the most efficient and scalable way of using it.

Thanks again Yavor

jan-cerny commented 6 years ago

Hi @yavor-atanasov ,

Thank you for opening the issue. There's no need to apologize. Actually I think your concern is very important.

What is the expected performance of an oscap scan?

The performance strongly depends on SCAP content. SCAP content is the input file that is consumed by oscap. It depends not only on how big is the file, but also what is written inside. In general, the input file can instruct oscap to do very easy tasks eg. reading a value from a single config file, but it can also instruct oscap to do resource-demanding tasks, like checking permissions of all files in whole root filesystem.

In your example, you're using a file that describes vulnerable RPM packages. This file is generated by Red Hat from the security advisories and contains list of all vulnerabilities in packages in RHEL ever.

Unfortunately this file is terribly big and it will be bigger and bigger in future, because new vulnerabilities will appear.. I see this becomes a big problem. Our implementation is not optimized to handle such big files. It keeps whole XML tree in memory. I think using some sort of a SAX parser would improve the performance greatly. But that would mean a large rewriting of our codebase.

Specifically for oval checks - what is the tool actually doing (is it running checks against the local rpm database, or also scanning individual files on the system)

While this scan, oscap is talking to local RPM database and compares the versions in the file with versions of packages in your system. In this case oscap doesn't read individual files on the system.

is there a documented recommended pattern/approach for setting up oscap as a continous monitoring solution.

You can schedule regular scans in your cron jobs.

You might also consider Red Hat Satellite, which has OpenSCAP plugin. Nice demo here: https://www.youtube.com/watch?v=p4uNlzYld-Y

We developed OpenSCAP Daemon (https://github.com/OpenSCAP/openscap-daemon) that was meant to do continuous monitoring. However there was no strong demand for this feature so far. It is still a tech-preview and we don't work on OpenSCAP Daemon now. I wouldn't recommend OpenSCAP Daemon for production. If you consider OpenSCAP Daemon a good idea, contributions are welcome :-)

jan-cerny commented 6 years ago

To improve performance try using --skip-valid option.

yavor-atanasov commented 6 years ago

Thank you both @jan-cerny and @matejak

I think we prefer running the oval analysis in a separate service and not having oscap run on each production machine. The AWS Inspector uses a similar approach - the required data is streamed from the monitored machines to a dedicated service that then that service analyses it and generates the results. This approach is less taxing in terms machine resources.

We looked into oscap-chroot to do the scans against chroots created with the required rpm databases, however having to juggle whole filesystems to just scan a list of packages seemed an overkill. (Even though it looks like the chroot can contain as little as /var/lib/rpm and /usr/lib/rpm, it feels hacky to synthetically construct these partial chroots)

Anyway, instead we looked into the oval spec and more specifically the subset that the ovals generated from Red Hat seem to use and we wrote a simple parser that takes as input the required rpm data in json format, e.g.:

...
  "bash": {
    "signature_keyid": "24c6a8a7f4a80eb5",
    "epoch": 0,
    "version": "4.2.46",
    "release": "29.el7_4",
    "evr": "4.2.46-29.el7_4",
    "arch": "x86_64"
  },
  "glibc-common": {
    "signature_keyid": "24c6a8a7f4a80eb5",
    "epoch": 0,
    "version": "2.17",
    "release": "196.el7",
    "evr": "2.17-196.el7",
    "arch": "x86_64"
  },
...

and then runs the scan against one of these ovals: https://www.redhat.com/security/data/oval/Red_Hat_Enterprise_Linux_6.xml https://www.redhat.com/security/data/oval/Red_Hat_Enterprise_Linux_7.xml

This will allow us to built a service fronted by a simple REST API that will accept json package metadata (for a given machine image or instance) and publish the results against current ovals (our initial tests show that if we parse the xml once and keep it in memory we can run a scan for ~0.06 seconds, which means we can even serve results synchronously with the http requests)

We'll try to make that tool/service public.

By no means does this mean that we looked at oscap and didn't like it. It just felt our use case requires us to run subset of what oscap does and do it more flexibly (against rpm metadata in canonical format as opposed to a whole filesystem) and faster.

Thank you again Yavor

OpenSCAP / openscap

oscap performance questions #900