Open MarkDavidson opened 10 years ago
The way the current markings are done is really hard to implement and use and is highly XML centric. If it is hard to implement and use, then my fear is people will just not implement it, which does not help anyone.
I would like to see marking done at the object level and have it be the first element in the object. You could use inheritance so that you only needed to put it in at change points. Doing it this way also allows me to read the marking value FIRST and then decide if I should read and process the rest of the data structure that I am currently in. Meaning, if the marking RED and my tool is not equipped to handle RED items, I should probably stop processing that data right there and then. Not read it all in, process it, and then try to figure out its marking.
I would propose something like this. And the marking could go at the sitx:indicators level or even the STIX package level (yes I know there is nothing else at the Indicators (plural) level yet). This would also solve the problems for other serializations.
<stix:indicator>
<marking xsi:type=“tlp:TLPMarkingStructureType"tlp:color="GREEN”/>
.. stuff ..
.. more stuff..
<indicator:Description>
<marking xsi:type=“tlp:TLPMarkingStructureType"tlp:color=“RED”/>
Some details about this and that
</indicator:Description>
</stix:indicator>
Strong proponent of Bret's vision here.
One downside to this approach is that it requires supporting mixed content in almost every field. An alternative would be including a "marking" attribute on most (all?) types which are references to MarkingStructure
s defined elsewhere.
Deferring to @wbolster, who will check it out in a bit - and has a bigger brain then I do :)
To clarify, by "almost every field", I mean "almost every field that currently contains simpleContent". Fields which are complexContent (most of the higher level types) can stay that way.
I have been working on this for JSON, as that was assigned to me on the last community call. And I think what I am leaning towards is a modified version of what I have above.. Something where we make a top level marking block and the markings have an ID. Then at each level where there is a change, we add a marking IDREF. This would allow us to define the markings once, but allow us to put the markings in the elements that need it. You would have full inheritance and it would be pretty easy to implement in tools since we already have to deal with IDREF content.
I like @jordan2175's second take on this.
Defining a global (top level) marking block on the package level, and adding attributes to components that can be covered by a specific marking (e.g. Indicator
), makes a lot of sense. This would completely eliminate the cumbersome and XML-specific XPath approach that is currently in use, and enable markings to be used when using non-XML STIX representations. (At @intelworks we're particularly interested in JSON-like data models.)
Additionally, the <marking:Controlled_Mapping>
"solution" mentioned by @MarkDavidson is (again) extremely XML-specific, and very cumbersome to use/implement, at least with the XML libraries I know.
It seems like there are two separate by related topics being discussed here:
1- representing the current marking capability in json 2- developing a new approach to markings for a major revision of the stix language
Item 2 should be informed by item 1. For item 1 to work, it needs to stick closely to the intent of the current marking structure. I am thinking that we might want to separate this into two different issues and allow this issue to focus on the revision of the stix language for version 2.0. Does that make sense?
Next, it is not clear to me that we are all operating off of a shared understanding of the requirements for data marking. In order to advance both of the above topics we should probably get a good handle on these requirements. We created a wiki page to allow us to start tracking these and other requirements and design decisions:
https://github.com/STIXProject/schemas/wiki/Design-Rationale
No, I do not see us discussing two things... Just one. The fact that I am working on JSON is really irrelevant to this. However, the benefit of working on another implementation allows us to more easily see issues, problems and possible solutions to the existing XML implementation.
So as I said above, for XML, I would like to see Marking be a first-class citizen, aka a top-level object and have an ID as part of it..
<Handling>
<Marking id=foogreen xsi:type=“tlp:TLPMarkingStructureType"tlp:color="GREEN”>
bla bla bla
</Marking>
<Marking id=foored xsi:type=“tlp:TLPMarkingStructureType"tlp:color="RED”>
bla bla bla
</Marking>
</Handling>
Then in something like Indicators we could do:
<stix:Indicators marking=foogreen>
<stix:Indicator id=bla>
<stix:Description marking=feered> some non restricted text </stix:Description>
<stix:Description marking=feered> some restricted text </stix:Description>
bla bla bla
</stix:indicator>
</stix:indicators>
So we set the top level as green and then do something specific deep inside and set that to red.
How would indicators support multiple markings? What if I want to mark something TLP:WHITE but also apply a copyright?
I do agree with @jonathanbaker that it would be helpful to outline these types of use cases in the wiki. For example:
If we are talking about an ideal solution for a future version of STIX, I think I'd have this to offer as requirements (Note: If these look like we'd want to adopt them from a STIX perspective, I can edit the wiki):
STIX Data Marking Requirements:
I'm going to step back from XML for a moment and offer this notional structure that meets some of the requirements I've set out:
Markings:
- TLP:WHITE; ID=1
- Copyright Mark Davidson; ID=2
Indicators:
- Indicator 1; File Hash=0xFEEDBEEF
- Indicator 2; IP=1.2.3.4
Relationships:
- From=Marking_1; To=Indicator_1; Relationship=Marks
My hope is that everyone can read this structure and envision an implementation in their respective technology stack.
-Mark
I would assert that at least the first three of those are implementation approaches, not requirements.
If we are talking about requirements for data markings we need to stay focused on actual use case requirements of what sorts of information needs to be represented to support the analysis and exchange of cyber threat information not on the potential implementation approaches for the structure. The latter must flow from the former not the other way around. The "obvious" implementation approaches can often miss important use case requirements. It is fine to start with the "obvious" approaches that solve many requirements but it is important to then test the approach with all of the requirements and evolve/iterate as issues are uncovered.
I hope to have a chance soon to capture as many of the markings use case requirements as I can (in the wiki) from the 2-3 years of community discussions on this topic.
Yes, we need to outline the requirements... Some that I have thought about so far are, and I added these to the wiki.
1) Ease of consuming the markings and keeping track of what is bound to what. 2) Allow parsers to understand what marking is to be applied to an object, before the said object is read and processed. I like the idea of the first element being read is the marking element. 3) Apply multiple markings to a single element.
I think the solution is some sort of inheritance model. The question is how best to craft that model so that it is clear what is going on and it is EASY to use. I guess I can see a situation where people might have very elaborate markings, however, I am guessing that these markings are somewhat static. Meaning that they get reused a lot on subsequent documents. So maybe doing something like a Marking_Group, or following along with Terry's relationship object some how. I could also see something like:
<Handling>
<Marking id=foo1234>
<Detail tlp=red/>
<Detail>Copyright</Detail>
</Markings>
</Handling>
@sbarnum that would be great. And yes, they are implementation elements not use cases, but I wanted to get them down as requirements for the design.
@MarkDavidson I like your idea from your last post
I updated the wiki with a few of key of the requirements that led to our current implementation (STIX 1.1.1. and before):
I added a new data markings page to capture this (took the existing requirements that @jonathanbaker, @jordan2175 and @MarkDavidson wrote up). I also added some basic writeups of the solutions. Please fill in more!
https://github.com/STIXProject/schemas/wiki/Design:-Data-Markings
There are some known complexities to implementing Data Markings, largely owing to the use of XPath to mark documents. There are known solutions for each of the complexities, so none of them are showstoppers, but it might make sense to attempt a modification/refactor in the future to make implementation easier.
Known complexities (and their solutions using STIX 1.1.1):
Marking_Structure
(e.g.,stix:STIX_Package//node()
). These namespace aliases use the node's namespace declarations. This requires that any time aSTIX_Package
is (re)shared, the sharing application must pay attention to any modification to the XML Document's namespace declarations and replicate those modifications in allMarking_Structure
elements in the document.Marking_Structure
values that do not use namespace aliases (e.g.,*[local-name() = 'Title' and namespace-uri() = 'http://stix.mitre.org/Indicator-2']
)Marking_Structure
values are appropriately modified. Note that this modification may break signatures.Marking_Structure
, as it effectively requires a lossless translation of XPath semantics to the target technology.Future revisions of STIX could address these challenges multiple ways. The ones I could think of are listed here:
Marking_Structure
(Could be done in a minor release). E.g.,Thoughts on this welcome. -Mark