There are too many ways to represent things in STIX, and this has a knock on effect when it comes time to implement code. We really need to find a way of simplifying things, so that we have one mandated way to do it that all parties are required to support, and that is the recommended way of doing things. The main way needs to support the 80% of situations. We also need to support niche customized extensions to STIX to still allow the other 20% to implement what they need, but the main functionality needs to be well-defined and simple.
As an example of the problem, below is a list of the currently ways a relationship between two IP addresses can be described in STIX. Any STIX compatible solution needs to be able to handle ingesting data in all of these forms. This creates extra work and difficulty for implementers.
12 ways to express context between 2 IP addresses
| Indicator, with two inline IPv4 AddressObjects
| Indicator, with two referenced IPv4 AddressObjects
| Indicator, with one inline IPv4 AddressObject using comma notation (127.0.0.1##comma##127.0.0.2)
| Indicator, with one referenced IPv4 AddressObject using comma notation (127.0.0.1##comma##127.0.0.2)
| A composite indicator including a single indicator, with two inline IPv4 AddressObjects
| A composite indicator including a single indicator, with two referenced IPv4 AddressObjects
| A composite indicator including a single indicator, with one referenced IPv4 AddressObject using comma notation (127.0.0.1##comma##127.0.0.2)
| A composite indicator including a single indicator, with one inline IPv4 AddressObject using comma notation (127.0.0.1##comma##127.0.0.2)
| A composite indicator with two indicators. Each indicator has a single inline IPv4 AddressObjects
| A composite indicator with two indicators. Each indicator has a single referenced IPv4 AddressObjects
| Two AddressObjects, no indicators, and "These IP addresses are malicious" placed in the Title field of the STIX_Header (implicit relationship)
| One AddressObject using comma notation (127.0.0.1##comma##127.0.0.2), no indicators, and "These IP addresses are malicious" placed in the Title field of the STIX_Header
POTENTIAL ANSWER
This is a general issue with STIX, CybOX and TAXII. There is an abundance of flexibility built into the protocol. This flexibility in turn causes its own problems, as it introduces complexity and makes it much harder for implementers to actual code for the multitude of different scenarios that are required to capture the variations allowed in the standard.
We need to decide what the high-level goals of the STIX v2.0 development are; what are we actually trying to achieve with STIX v2.0. I believe the following list is a good start:
Goals (developed from the TAXII group goals):
Simplicity
Easy to implement and understand
One way of doing things where possible
Reduce optionality
Support Customization in a simple, standardized way
Don’t allow customization everywhere, only where it is likely to be used
Standardization
Do things in the same way across STIX and CybOX
Reuse similar structures in similar yet distinct parts of the model
Modularity
Provide building blocks that can be reused elsewhere
Ensure tight cohesion, and low coupling of those building blocks
Flexibility
Use modularity to provide flexibility
Allow relationships to exist between any objects
Better Analysis
Easier to graph relationships
Easier to track changes over time
Easier to put together timelines
Minimize resource usage
Reduce size where possible
Only transmit what is necessary
Target the 80%
Concentrate on the common scenarios (Use Cases)
Work on the other 20% in subsequent releases
For the specific scenario described above, there are a few issues at play:
The ability to either Inline or Reference content
The ability to either create Composite Indicators or a list of items within an object to reflect grouping
The ability for either an implicit or explicit relationship
All of these 12 ways are trying to do a fairly simple thing - show a relationship between two objects. I believe that the problems can be fixed here and in other parts of the STIX data model by adding a few changes/rules (as described above in the previous section):
Make relationships a top-level object. All other STIX objects are considered STIX data nodes
Force relationships to exist ONLY explicitly i.e. if there is a relationship object linking two STIX data nodes together
Multiple relationships are shown with multiple single relationship objects to allow for easier
Deprecate the ##COMMA## notation, and replace with multiple single relationship objects.
Relationships are allowed to be sent independently of the STIX data nodes they refer to.
The reason that I am suggesting 'single' relationship objects is to allow third-parties to [+1/-1] each individual relationship, and to maintain individual objects so that the timeline of the order that things happened can be traced. This will help when analysis is performed targeting a particular sequence of events.
PROBLEM
There are too many ways to represent things in STIX, and this has a knock on effect when it comes time to implement code. We really need to find a way of simplifying things, so that we have one mandated way to do it that all parties are required to support, and that is the recommended way of doing things. The main way needs to support the 80% of situations. We also need to support niche customized extensions to STIX to still allow the other 20% to implement what they need, but the main functionality needs to be well-defined and simple.
As an example of the problem, below is a list of the currently ways a relationship between two IP addresses can be described in STIX. Any STIX compatible solution needs to be able to handle ingesting data in all of these forms. This creates extra work and difficulty for implementers.
12 ways to express context between 2 IP addresses
POTENTIAL ANSWER
This is a general issue with STIX, CybOX and TAXII. There is an abundance of flexibility built into the protocol. This flexibility in turn causes its own problems, as it introduces complexity and makes it much harder for implementers to actual code for the multitude of different scenarios that are required to capture the variations allowed in the standard.
We need to decide what the high-level goals of the STIX v2.0 development are; what are we actually trying to achieve with STIX v2.0. I believe the following list is a good start:
Goals (developed from the TAXII group goals):
For the specific scenario described above, there are a few issues at play:
All of these 12 ways are trying to do a fairly simple thing - show a relationship between two objects. I believe that the problems can be fixed here and in other parts of the STIX data model by adding a few changes/rules (as described above in the previous section):
The reason that I am suggesting 'single' relationship objects is to allow third-parties to [+1/-1] each individual relationship, and to maintain individual objects so that the timeline of the order that things happened can be traced. This will help when analysis is performed targeting a particular sequence of events.