CybOXProject / schemas

CybOX Schemas and Schema Development
42 stars 17 forks source link

Separate Patterns and Instances in CybOX Observables and Objects #381

Open ikiril01 opened 8 years ago

ikiril01 commented 8 years ago

Currently, a CybOX Observable, in combination with an Object, can define either an observed instance of some data (e.g., a file), or a pattern for detecting some data. While this duality has its uses and benefits, particularly in terms of reducing schema complexity (i.e. essentially the same schema structures can be used for capturing both instances and patterns), it has a number of drawbacks, including:

For these reasons, I propose the separation of instances and patterns in CybOX Observables and Objects. This will have a number of benefits, including:

Accordingly, this would entail deprecating the existing patterning based structures in CybOX, and would require revisions to all CybOX Objects, making this suitable only for a major revision. It would also require the definition of a new patterning object (e.g., "PatternType"), that functions as an implementation of a domain-specific patterning language. As a strawman, this could look something like:

<Observable id="obs-1">
  <Object>
    <Properties xsi:type="AddressObj:AddressObjectType">
      <Address_Value id="value-1">1.2.3.4</Address_Value>
    </Properties>
  </Object>
</Observable>

<Observable id="obs-2">
  <Object>
    <Properties xsi:type="AddressObj:AddressObjectType">
      <Address_Value id="value-2">2.3.4.5</Address_Value>
    </Properties>
  </Object>
</Observable>

<Pattern operator="AND">
  <Observable_Property condition="equals" idref="value-1"/>
  <Observable_Property condition="equals" idref="value-2"/>
  <Modifiers>
    <Sequence>
      <Ordinality>
        <Ordinal_Position value="1" idref="value-1"/>
        <Ordinal_Position value="2" idref="value-2"/>
      </Ordinality>
      <Time_Window units="minutes">60</Time_Window>
    </Sequence>
  </Modifiers>
</Pattern>
ikiril01 commented 8 years ago

Also, if we do go down this road, it may make sense to move patterns and the associated structures into STIX, given that they're really quite specific to STIX Indicators.

johnwunder commented 8 years ago

:+1:

I like the idea of moving patterning to STIX...it seems like that behavior more naturally fits inside STIX indicators (which are about matching things) than CybOX.

bworrell commented 8 years ago

I agree with @johnwunder. :+1: for everybody!

athiasjerome commented 8 years ago

For reference: https://stixproject.github.io/data-model/1.2/cybox/ObservableType/ I concur that having the Observation (patterning) separate from the Observables makes sense and, imho, would be easier to understand and manipulates. I want to point out that what we're describing just looks like the OVAL mechanisms + the non-static element of timing and sequencing

bgro commented 8 years ago

I really would like to get rid of AND/OR/...-composition on the level of indicators and push the whole pattern language into (oh no, here it comes) yet another STIX top level object STIX_Pattern or whatever. Then allow STIX_Patterns to reference other patterns for composition, but not Indicators.

The reason is that the whole logical and temporal composition is really complicated. The Indicator is absolutely central to STIX, but like I said above, there may be use-cases to communicate things to look out for without the use of CybOX and CybOX patterning but via a test mechanism (or, I haven't given up on that idea, a STIX-standardized key/value-pair mechanism for communicating patterns).

Now, if we encapsulate the complications of the pattern language in something that is removed from the Indicator level, then one can implement the non-patterning-part of STIX much easier in order to use it for use cases as described above. Also, using STIX for other domains (fraud and what not) as has been mentioned on the list yesterday, could become easier, since for these other domains, the full might of STIX patterns may not be needed ... or such domains may even need something else entirely. (Hm, I just realize that this reasoning would actually support making patterning part of CybOX as a CybOX-pattern-top-level object.)

bgro commented 8 years ago

Looking at the example provided by Ivan some more: the downside of the pattern language referencing the observables is, that it becomes utterly utterly unreadable. I know that with the complexity of STIX/CybOX, human readability is not the measure of all things, but still ...

Also, in terms of not making things more complicated than necessary: if I want to write a rule that involves IP 127.0.0.1, I have to specify a complete CyberObservable with ID and everything, just to be able to reference a single property in the observable by yet another identifier that pertains to a single line within that definition.

OpenIOC uses a sort of path-statement for expressing queries. I am sure when starting CybOX there was a reason for not using that mechanism for patterning, but what was it? Because the following really looks quite usable to me:

<Pattern operator="AND">
 <Observable_Property @type='AdressObject' @path='Properties/Address_Value' condition="equals">
  1.2.3.4
 </Observable_Property>
 <Observable_Property @type='AdressObject' @path='Properties/Address_Value' condition="equals">
  2.3.4.5
 </Observable_Property>
 (...)
</Pattern>

The possible 'paths' would be defined by the structure of the CybOX object ...

MarkDavidson commented 8 years ago

I think separating instances from patterns would clarify/simplify certain structures, and I think it's a good idea.

In terms of pushing patterning into STIX, I'm not as sold on that. Since patterning is going to depend on some object model (e.g., CybOX), I'm not sure how the patterns/objects could live separate places without those two areas being very tightly coupled (tight coupling is an argument for being the same thing, IMO). Is there an idea how patterns could live in STIX and objects could live in CybOX without a tight coupling?

-Mark

johnwunder commented 8 years ago

Separating out instances from patterns in CybOX would be a huge conceptual simplification. The current approach means that other data models can't (in a schema structure) say which type they expect and consumer/producer tools have a harder time figuring out what they should expect or produce.

One other thing to consider is that there are places in STIX that expect a CybOX pattern (or instance) other than just Indicator. Would the same "pattern" approach be used in all of those places as well? (e.g. TTP/Infrastructure)

Given the STIX dependence on CybOX I'm not sure a tight coupling would be a bad thing. If it were just indicators I might think otherwise, but between indicators, sightings, and all the other places CybOX is used I think doing a loose coupling potentially could make things more confusing.

ikiril01 commented 8 years ago

@johnwunder agreed - I think STIX and CybOX are inherently tightly coupled, which is not necessarily a bad thing.

@bgro yes, the above strawman example is horribly ugly. I do like the idea of using a path-based syntax for capturing the fields that one is patterning against; I believe the primary reason we didn't use it before is that it would have required the separation of Observable patterns and Observable instances.

Trying to express patterns in XML is painful. So, to combine the path-based idea and the strawman YARA-like syntax I floated on the discussion list, we could have something like:

pattern example_1 
{
  objects:
       $OBJ1 = {type = FileObject,
                       fields = [{“hashes/hash/simple_hash_value:”c38862b4835729d979e7940d72a48172”,
                                     "file_name":"abcd.dll"}]} 

    $OBJ2 = {type = WinRegistryKeyObject,
                    fields = [{“key”:”.DEFAULT\Software\Microsoft\Windows\CurrentVersion\Explorer\{19127AD2-394B-70F5-C650-B97867BAA1F7}”},
                                 {“hive”:”HKEY_USERS”}]} 
condition: 
    OBJ1 and OBJ2
}
johnwunder commented 8 years ago

This is a little more philosophical than some of our current discussion, but what is the intent of that AND condition. Is it saying that those two objects exist on the same system? Exist on the same system at the same time? Exist in the same enterprise?

Similarly, there's (I assume) an implicit AND in those objects. There's a trade-off in verbosity and complexity but we could always make some of these implicit assumptions explicit in the structure:

pattern:
  fields:
    $field1: FileObject/hashes/hash/simple_hash_value EQUALS c38862b4835729d979e7940d72a48172
    $field2: FileObject/file_name CONTAINS malicious.dll
  condition:
    SAME_OBJ($field1, $field2)

Now that I type all that out though I kind of hate it. So complicated...

athiasjerome commented 8 years ago

Suggested to rename "Observable Instance" as "Observation" (also suggest to review the EventType)

athiasjerome commented 8 years ago

Reference to documentation (to be updated in case of changes) https://stixproject.github.io/documentation/suggested-practices/#observable

athiasjerome commented 8 years ago

Related https://github.com/STIXProject/schemas/issues/376

athiasjerome commented 8 years ago

From an Information Model point of view, it could be abstracted the concepts of Components and Instances.

An example, just for reference (imagine/replace "Hardware/Software Component" by "CybOX Object/Cyber Component"):

TODO: The figure below needs to be updated to show the relationships between the different types of assets.
        <figure title="Model of an Endpoint"
            anchor="figure-model-of-an-endpoint">
            <artwork>
                <![CDATA[
            +---------+*______in>_______*+-----+
            |Hardware |                  |!   !|
            |Component|   +---------+    |!   !|
            +---------+   |Software |in> |!   !|
                1|        |Component|____|!   !|
                 |        +---------+*  *|!   !|
                 |            1|         |!   !|
                 |            *|         |     |       +----------+    
                 |        +---------+    |End- |*_____*| Identity |
                *|        |Software |in> |point| acts  +----------+
            +---------+   |Instance |____|     | for>        
            |Hardware |   +---------+*  1|!   !|          
            |Instance |__________________|!   !|           
            +---------+*      in>       1|!   !|
                                         |!   !|
                                         |!   !|____   
                                         |!   !|0..1|  
                                         +-----+    |     
                                            |*      |  
                                            |_______|  
                                               in> 
                ]]>
            </artwork>

Ref. https://raw.githubusercontent.com/sacmwg/draft-ietf-sacm-information-model/master/draft-ietf-sacm-information-model.xml