Consider modifying the mechanism by which Data Markings are applied

MarkDavidson commented 10 years ago

There are some known complexities to implementing Data Markings, largely owing to the use of XPath to mark documents. There are known solutions for each of the complexities, so none of them are showstoppers, but it might make sense to attempt a modification/refactor in the future to make implementation easier.

Known complexities (and their solutions using STIX 1.1.1):

Namespace aliases are brittle - The current implementation of Data Markings makes it possible to use namespace aliases in the Marking_Structure (e.g., stix:STIX_Package//node()). These namespace aliases use the node's namespace declarations. This requires that any time a STIX_Package is (re)shared, the sharing application must pay attention to any modification to the XML Document's namespace declarations and replicate those modifications in all Marking_Structure elements in the document.
- Solution 1 (For producers) - Use Marking_Structure values that do not use namespace aliases (e.g., *[local-name() = 'Title' and namespace-uri() = 'http://stix.mitre.org/Indicator-2'])
- Solution 2 (For re-sharers) - Make sure to either persist the original namespace declarations or, if the namespace declarations are modified, make sure that the Marking_Structure values are appropriately modified. Note that this modification may break signatures.
Marking_Structure values are hard to apply outside of XML - Translating STIX into another technology (Python objects, SQL/NoSQL database, JSON, etc) makes it very difficult to use the XPath contained in the Marking_Structure, as it effectively requires a lossless translation of XPath semantics to the target technology.
- Solution 1 - Keep the original XML around for the purpose of evaluating markings. (I think?)
Are there others?

Future revisions of STIX could address these challenges multiple ways. The ones I could think of are listed here:

Add elements that allow the specification of namespace alias and namespace in the Marking_Structure (Could be done in a minor release). E.g.,

<marking:Controlled_Structure>/stix:STIX_Package//node()</marking:ControlledStructure>
<marking:Controlled_Mapping>
     <marking:ns alias=”stix” uri=”http://stix.mitre.org/stix-1”/> <!-- 0-n occurence -->
<marking:Controlled_Mapping>

Move away from XPath in general, and come up with an alternative capability (Would probably have to be done in a major release)
Are there others?

Thoughts on this welcome. -Mark

jordan2175 commented 9 years ago

The way the current markings are done is really hard to implement and use and is highly XML centric. If it is hard to implement and use, then my fear is people will just not implement it, which does not help anyone.

I would like to see marking done at the object level and have it be the first element in the object. You could use inheritance so that you only needed to put it in at change points. Doing it this way also allows me to read the marking value FIRST and then decide if I should read and process the rest of the data structure that I am currently in. Meaning, if the marking RED and my tool is not equipped to handle RED items, I should probably stop processing that data right there and then. Not read it all in, process it, and then try to figure out its marking.

I would propose something like this. And the marking could go at the sitx:indicators level or even the STIX package level (yes I know there is nothing else at the Indicators (plural) level yet). This would also solve the problems for other serializations.

<stix:indicator>
  <marking xsi:type=“tlp:TLPMarkingStructureType"tlp:color="GREEN”/>
  .. stuff ..
  .. more stuff..
  <indicator:Description>
    <marking xsi:type=“tlp:TLPMarkingStructureType"tlp:color=“RED”/>
    Some details about this and that
  </indicator:Description>
</stix:indicator>

jgommers commented 9 years ago

Strong proponent of Bret's vision here.

gtback commented 9 years ago

One downside to this approach is that it requires supporting mixed content in almost every field. An alternative would be including a "marking" attribute on most (all?) types which are references to MarkingStructures defined elsewhere.

jgommers commented 9 years ago

Deferring to @wbolster, who will check it out in a bit - and has a bigger brain then I do :)

gtback commented 9 years ago

To clarify, by "almost every field", I mean "almost every field that currently contains simpleContent". Fields which are complexContent (most of the higher level types) can stay that way.

jordan2175 commented 9 years ago

I have been working on this for JSON, as that was assigned to me on the last community call. And I think what I am leaning towards is a modified version of what I have above.. Something where we make a top level marking block and the markings have an ID. Then at each level where there is a change, we add a marking IDREF. This would allow us to define the markings once, but allow us to put the markings in the elements that need it. You would have full inheritance and it would be pretty easy to implement in tools since we already have to deal with IDREF content.

wbolster commented 9 years ago

I like @jordan2175's second take on this.

Defining a global (top level) marking block on the package level, and adding attributes to components that can be covered by a specific marking (e.g. Indicator), makes a lot of sense. This would completely eliminate the cumbersome and XML-specific XPath approach that is currently in use, and enable markings to be used when using non-XML STIX representations. (At @intelworks we're particularly interested in JSON-like data models.)

wbolster commented 9 years ago

Additionally, the <marking:Controlled_Mapping> "solution" mentioned by @MarkDavidson is (again) extremely XML-specific, and very cumbersome to use/implement, at least with the XML libraries I know.

jonathanbaker commented 9 years ago

It seems like there are two separate by related topics being discussed here:

1- representing the current marking capability in json 2- developing a new approach to markings for a major revision of the stix language

Item 2 should be informed by item 1. For item 1 to work, it needs to stick closely to the intent of the current marking structure. I am thinking that we might want to separate this into two different issues and allow this issue to focus on the revision of the stix language for version 2.0. Does that make sense?

Next, it is not clear to me that we are all operating off of a shared understanding of the requirements for data marking. In order to advance both of the above topics we should probably get a good handle on these requirements. We created a wiki page to allow us to start tracking these and other requirements and design decisions:

https://github.com/STIXProject/schemas/wiki/Design-Rationale

jordan2175 commented 9 years ago

No, I do not see us discussing two things... Just one. The fact that I am working on JSON is really irrelevant to this. However, the benefit of working on another implementation allows us to more easily see issues, problems and possible solutions to the existing XML implementation.

jordan2175 commented 9 years ago

So as I said above, for XML, I would like to see Marking be a first-class citizen, aka a top-level object and have an ID as part of it..

<Handling>
    <Marking id=foogreen xsi:type=“tlp:TLPMarkingStructureType"tlp:color="GREEN”>
        bla bla bla
    </Marking>
    <Marking id=foored xsi:type=“tlp:TLPMarkingStructureType"tlp:color="RED”>
        bla bla bla
    </Marking>
</Handling>

Then in something like Indicators we could do:

<stix:Indicators marking=foogreen>
    <stix:Indicator id=bla>
        <stix:Description marking=feered> some non restricted text </stix:Description>
        <stix:Description marking=feered> some restricted text </stix:Description>
        bla bla bla
    </stix:indicator>
</stix:indicators>

So we set the top level as green and then do something specific deep inside and set that to red.

johnwunder commented 9 years ago

How would indicators support multiple markings? What if I want to mark something TLP:WHITE but also apply a copyright?

I do agree with @jonathanbaker that it would be helpful to outline these types of use cases in the wiki. For example:

Mark a single node with multiple markings (same or different marking type)
Apply an overall marking to the entire document and override at specific locations
etc. etc.

MarkDavidson commented 9 years ago

If we are talking about an ideal solution for a future version of STIX, I think I'd have this to offer as requirements (Note: If these look like we'd want to adopt them from a STIX perspective, I can edit the wiki):

STIX Data Marking Requirements:

Markings must be first class citizens (e.g, their own object with their own ID)
There can be 0-n relationships between STIX Objects (e.g., Indicator) and Data Markings (So maybe there is a first class relationship object as well)
Markings must be portable. This means STIX Data Markings can be meaningfully implemented across a representative subset of technologies (e.g., Object Oriented languages, XML, JSON, Relational databases)
Marking processing must be deterministic. There must be a clearly defined processing model, such that different implementations will always arrive at the same conclusions from the same data (with the exception of bugs and such). Said another way, Data Marking processing should be representable as a state diagram.

I'm going to step back from XML for a moment and offer this notional structure that meets some of the requirements I've set out:

Markings:
 - TLP:WHITE; ID=1
 - Copyright Mark Davidson; ID=2
Indicators:
 - Indicator 1; File Hash=0xFEEDBEEF
 - Indicator 2; IP=1.2.3.4
Relationships:
- From=Marking_1; To=Indicator_1; Relationship=Marks

My hope is that everyone can read this structure and envision an implementation in their respective technology stack.

-Mark

sbarnum commented 9 years ago

I would assert that at least the first three of those are implementation approaches, not requirements.

If we are talking about requirements for data markings we need to stay focused on actual use case requirements of what sorts of information needs to be represented to support the analysis and exchange of cyber threat information not on the potential implementation approaches for the structure. The latter must flow from the former not the other way around. The "obvious" implementation approaches can often miss important use case requirements. It is fine to start with the "obvious" approaches that solve many requirements but it is important to then test the approach with all of the requirements and evolve/iterate as issues are uncovered.

I hope to have a chance soon to capture as many of the markings use case requirements as I can (in the wiki) from the 2-3 years of community discussions on this topic.

jordan2175 commented 9 years ago

Yes, we need to outline the requirements... Some that I have thought about so far are, and I added these to the wiki.

1) Ease of consuming the markings and keeping track of what is bound to what. 2) Allow parsers to understand what marking is to be applied to an object, before the said object is read and processed. I like the idea of the first element being read is the marking element. 3) Apply multiple markings to a single element.

I think the solution is some sort of inheritance model. The question is how best to craft that model so that it is clear what is going on and it is EASY to use. I guess I can see a situation where people might have very elaborate markings, however, I am guessing that these markings are somewhat static. Meaning that they get reused a lot on subsequent documents. So maybe doing something like a Marking_Group, or following along with Terry's relationship object some how. I could also see something like:

<Handling>
  <Marking id=foo1234>
    <Detail tlp=red/>
    <Detail>Copyright</Detail>
  </Markings>
</Handling>

jordan2175 commented 9 years ago

@sbarnum that would be great. And yes, they are implementation elements not use cases, but I wanted to get them down as requirements for the design.

jordan2175 commented 9 years ago

@MarkDavidson I like your idea from your last post

jonathanbaker commented 9 years ago

I updated the wiki with a few of key of the requirements that led to our current implementation (STIX 1.1.1. and before):

Apply markings to either a set of fields or a single field (a.k.a field level markings).
Markings need to be applied to structures that are not part of the stix language (i.e. MAEC, CybOX, CVRF, CIQ, etc.)
Apply multiple markings to a single field or structure.
Different organizations have different marking schemes and the stix language should allow for other these marking schemes to be modeled and applied.

johnwunder commented 9 years ago

I added a new data markings page to capture this (took the existing requirements that @jonathanbaker, @jordan2175 and @MarkDavidson wrote up). I also added some basic writeups of the solutions. Please fill in more!

https://github.com/STIXProject/schemas/wiki/Design:-Data-Markings

STIXProject / schemas

Consider modifying the mechanism by which Data Markings are applied #231