Acceptance tests - Githubissues

harpocrates commented 7 years ago

We will communicate with TA1 providers our expectations about their data via acceptance tests (basically unit tests to enforce syntactic/semantic validity, with respect to our understanding, of their data). These are to be delivered using the adapt-tester, and should be constantly refined until April.

This issue will track progress on this front and provide a place for keeping note of things suggested.

So far, we have come up with the following ideas of things to test as a bare minimum:

parsing
ground truth tests and assertions
TA1 specific tests
one of every CDM type (as makes sense per TA1) - aka the "Zen cheeseburger"
deduplication

Later on, once CDM14 is out, we will be able to leverage "epoch" also.

harpocrates commented 7 years ago

WRT the zen cheeseburger, these are the CDM statements that are missing for each TA1 (from the engagement data)

__SOURCE_FREEBSD_DTRACE_CADETS__: MemoryObject, ProvenanceTagNode, RegistryKeyObject, SrcSinkObject, TagEntity, Value
__SOURCE_ANDROID_JAVA_CLEARSCOPE__: MemoryObject, RegistryKeyObject, TagEntity, Value
__SOURCE_WINDOWS_DIFT_FAROS__: MemoryObject, RegistryKeyObject, TagEntity, Value
SOURCE_WINDOWS_FIVEDIRECTIONS: MemoryObject, TagEntity, Value
SOURCE_LINUX_THEIA: Value
__SOURCE_LINUX_AUDIT_TRACE__: Value

Obviously no one has implemented anything about Value, so it seems a bit foolish to have tests for that. That said, I'm a bit surprised __SOURCE_WINDOWS_DIFT_FAROS__ doesn't have RegistryKeyObject. Should we include a test for this anyways?

Also, it appears there are a bunch more sources for which we have no samples. What should the default be for those? Expect statements of every type?

davearcher commented 7 years ago

Couple of points here…

Yes, not many folks had any memory objects in E1. Values were similarly unused, as their only application is for representing arguments to functions. I don’t think anyone was really equipped to do that. It doesn’t surprise me that FAROS is missing RegstryKeyObjects, esp. since they’re not monitoring Windows systems…

For CDM13, I think only 6 of the source values were used…1 for each TA1 provider. The others in your list appear to be sub-classes of those providers. I wonder whether that list is aspirational (if it came from the CDM13 spec), or whether it’s the CDM14 enum set (which is probably also aspirational).

Thanks! /d

On Jan 13, 2017, at 3:57 PM, Alec Theriault notifications@github.com wrote:

WRT the zen cheeseburger, these are the CDM statements that are missing for each TA1 (from the engagement data)

SOURCE_FREEBSD_DTRACE_CADETS: MemoryObject, ProvenanceTagNode, RegistryKeyObject, SrcSinkObject, TagEntity, Value SOURCE_ANDROID_JAVA_CLEARSCOPE: MemoryObject, RegistryKeyObject, TagEntity, Value SOURCE_WINDOWS_DIFT_FAROS: MemoryObject, RegistryKeyObject, TagEntity, Value SOURCE_WINDOWS_FIVEDIRECTIONS: MemoryObject, TagEntity, Value SOURCE_LINUX_THEIA: Value SOURCE_LINUX_AUDIT_TRACE: Value Obviously no one has implemented anything about MemoryObject or Value, so it seems a bit foolish to have tests for that. That said, I'm a bit surprised SOURCE_WINDOWS_DIFT_FAROS doesn't have RegistryKeyObject. Should we include a test for this anyways?

Also, it appears there are a bunch more sources https://github.com/GaloisInc/adapt/blob/b78c202857bc873ccbbd1763060c2f6a4d9f7959/AdaptJVM/src/main/scala/com/galois/adapt/cdm13/CDM13.scala#L31-L43 for which we have no samples. What should the default be for those? Expect statements of every type?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GaloisInc/adapt/issues/166#issuecomment-272579836, or mute the thread https://github.com/notifications/unsubscribe-auth/AJN5v4KwWx-6d1LFbhILBhJwHEBJcZNNks5rSA9rgaJpZM4LiZ8H.

rrwright commented 7 years ago

For now, I think we should immediately fail when detecting a source for which we haven't planned tests. I think Dave is right that CDM13 started off being a bit more aspirational. But I don't think most teams are going to add that in CDM 14. So let's just make an early test that fails (and the remainder abort?) unless the source data claims to be one of the types we plan to see.

davearcher commented 7 years ago

Agreed!

David W. Archer, PhD Principal Investigator Galois, Inc. 421 SW 6th Avenue, Suite 300 Portland, OR 97204 email: dwa@galois.com mobile: 503-701-0235

On Jan 14, 2017, at 2:42 PM, Ryan Wright notifications@github.com wrote:

For now, I think we should immediately fail when detecting a source for which we haven't planned tests. I think Dave is right that CDM13 started off being a bit more aspirational. But I don't think most teams are going to add that in CDM 14. So let's just make an early test that fails (and the remainder abort?) unless the source data claims to be one of the types we plan to see.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GaloisInc/adapt/issues/166#issuecomment-272658818, or mute the thread https://github.com/notifications/unsubscribe-auth/AJN5v9bwTCDoYW5Pcn3pPZS0DKlc1LcTks5rSU8_gaJpZM4LiZ8H.

harpocrates commented 7 years ago

Consolidating from email, here are more (syntactic) tests to do eventually

A few below are labeled “HOLD”, because (as of Jeff’s recognition this morning that CDM is slightly broken) we need to let a fix settle out. The others should be good to go.

the UUID field of a PTN must not match that of any other PTN

the srcSinkObject field of a PTN must refer to a subject or object type dependent on the type of event pointed to by the event field, as follows

bind, connect, accept, sendmsg, recvfrom, sendto, recv events: srcSinkObject = NetFlow

open, close: srcSinkObject = File or NetFlow

read, write, unlink, create-object, dup, fnctl, mmap, modify attributes, truncate, update: srcSinkObject = File

mprotect, shm: srcSinkObject = Memory

change-principal, clone, create-thread, execute, fork, signal, unit, wait, exit: srcSinkObject = Subject, with subjectType = Process or Thread

the subject field of a PTN must refer to a subject, with subjectType = Process, Thread, or Unit

(HOLD this one - CDM currently under definition) the event field of a PTN must refer to an event object, where eventType is one of the above event types

(HOLD also) the subject field of a PTN must match the subject field of the event referred to in the event field

(HOLD also) the srcSinkObject field of a PTN must match the predicateObject field of the referenced event

the prevTagId field of a PTN must refer to another PTN, or be zero (the "no provenance" value)

(HOLD also) if the subject field of a PTN A is the same as the subject field of the PTN (B) referenced in its prevTagId field, then the sequence number of the event referred to in A must be less than the sequence number referred to in B

exactly one of the prevTagId field and the tagIds field must be non-null (that is, one of the two must be populated, but only one)

if opcode of a PTN is null, then tagIds must be empty (and thus prevTagId must not be null)

if opcode of a PTN is non-null, then it must be UNION

if PTN.prevTagId is not null, then the cTag and iTag values of the PTN must match those of the PTN referred to in prevTagId

if PTN.prevTagId is zero, then cTag and iTag must be populated (if the TA1 provider claims to support confidentiality and trustworthiness tagging)

(HOLD also) events of type read, write, sendmsg, sendto, recvfrom, recv, dup, mmap must all be referenced as the relevant event of some PTN

Each non-null parameter (Value record) in an event must contain at least 1 runLengthTuple

Each runLengthTuple in a Value record must have both a non-zero natural number (for length) and the UUID of a valid PTN

rrwright commented 7 years ago

@harpocrates We need to add a test to ensure UUID uniqueness (ironically)—to ensure that only one node ever claims to have any given UUID. There is talk by TA1s about updating data by issuing a new event that reuses an old UUID (and so it would replace earlier data). We need to shut that down every way possible. So let's encode it in a test ASAP.

rrwright commented 7 years ago

PS - @harpocrates after that's done, you should close this issue. :-)

harpocrates commented 7 years ago

Fixed (not super efficiently) with https://github.com/GaloisInc/adapt/commit/1a25b8c899a6d28b87ad06ba27dcd51f490d5a4e.

rrwright commented 7 years ago

@harpocrates did you deploy these tests yet? I don't see them being run in the version I just ran a moment ago.

GaloisInc / adapt

Acceptance tests #166