Partial Credit in CWEs scoring

njshanahan commented 3 years ago

The HARD pipelines mitigated one or more parts of the following Resource Management tests, however, the reported score for each was HIGH vulnerability. Given the Buffer Error scoring range philosophy, shouldn't these tests have scored LOW or MED?

|----------|----------|--------------------------------------------------------------------------------------|
| TEST-672 |   HIGH   |                  heapUsePostRelease:HIGH, stackUsePostRelease:NONE                   |
|----------|----------|--------------------------------------------------------------------------------------|
| TEST-761 |   HIGH   |                      p01:NONE, p02:NONE, p03:HIGH, p04:DETECTED                      |
|----------|----------|--------------------------------------------------------------------------------------|
| TEST-763 |   HIGH   |                                p01:DETECTED, p02:HIGH                                |
|----------|----------|--------------------------------------------------------------------------------------|
| TEST-770 |   HIGH   |                       heapExhaust:DETECTED, stackExhaust:HIGH                        |
|----------|----------|--------------------------------------------------------------------------------------|
| TEST-789 |   HIGH   |                       heapExhaust:DETECTED, stackExhaust:HIGH                        |
|----------|----------|--------------------------------------------------------------------------------------|
| TEST-825 |   HIGH   |                         p01:DETECTED, p02:HIGH, p03:DETECTED                         |
|----------|----------|--------------------------------------------------------------------------------------|

I would have expected the first five tests to have scored MED, because 50% of vulnerable test components were addressed (CWE-761 includes two benign parts). Similarly, I would have expected CWE-825 to have scored LOW because 67% of the test parts were mitigated. I appreciate your help.

Tagging @austinhroach for awareness.

rtadros125 commented 3 years ago

This is an expected behavior;

Some test parts are validation tests that just checks that the API/function used by the test works fine.
A CWE might be interpreted differently based on, say, A and B. Both A and B are different instances of the same CWE. If a processor thwarts instance A, but cannot detect B, could we claim that it protects against that CWE? The answer is no. How to give partial credit is thus a subjective matter. The way these tests are scored is based on "worst case". So if any instance exhibited a HIGH vulnerability of the type CWE-X, then CWE-X score would be HIGH. This choice was motivated by being conservative in making security claims.

njshanahan commented 3 years ago

Understood. However, consider CWE-825. The stack component of the test was specifically added per our request (Issue #1044) to reflect the capabilities of the HARD solution. If this is a valid variant of the CWE, should the score not reflect a successful mitigation?

rtadros125 commented 3 years ago

You're making a valid point. This is a very subjective and a non-empirical call. @austinhroach, please let us know if DARPA wants us to review all the multi-part tests and add a subjective weighted calculation to each CWE score.

njshanahan commented 3 years ago

@austinhroach Any thoughts regarding this?

austinhroach commented 3 years ago

These are both valid perspectives. Within DARPA, we've been discussing scoring for the purposes of reporting, and have come to the conclusion that it would be useful to have both a "percentage of tests passed" score for each CWE, in addition to the binary "was this CWE mitigated in all tests" score.

For the purposes of the "percentage of tests passed" score, given the above example we would say that for CWE-825 67% of the tests passed. If this was still the case when final results were reported, we would want to see something like "67% of the tests for CWE-825 were passed. This CWE was not completely mitigated, and here's the reason why."

rtadros125 commented 3 years ago

Roger that; I have changed the ticket title to be more generic. I will keep the format as shown PartX: score(partX), ...), but the overall score of the CWE will be changed, which will be similar to what Nicholas has described above.

GaloisInc / BESSPIN-Tool-Suite

Partial Credit in CWEs scoring #1185