CybOXProject / schemas

CybOX Schemas and Schema Development
42 stars 17 forks source link

Too many ways of representing atomic Indicator data #376

Open c-x opened 9 years ago

c-x commented 9 years ago

The standard permit too many ways of describing one single indicator. Because of this, it’s really difficult to implement the standard and being fully compliant with it, so it’s a blocker to a wider adoption (how many vendors are actually talking about supporting STIX/CybOX and how many really do it?).

Example: To describe an IPv4 address, we can today have the following representations:

If the standard were allowing only one way to describes IPv4 addresses, like using the CIDR notation (127.0.0.1/32), it would be super easy to anyone to actually implement the standard. And there is no loss of information because this CIDR notation cover all possibles IPs or Range of IPs. Eventually, for convenience, we may want to have 2 formats: the CIDR notation and the single IP notation, but no more than that.

Don’t get me wrong, I’m not saying an analyst shouldn’t be able to input different format of IPv4 in the software it uses (like Soltra Edge for example), I’m saying that particular need is out of the scope of the standard. This is the goal of the software to do the transformation to what the standard is expecting (the CIDR notation).

In short, I think the problem here is that the standard cover some things that should be part of software specifications and not part of the standard itself.

ikiril01 commented 9 years ago

Agreed – I think many of the existing components of CybOX were designed to cover the broadest set of use cases in mind, without considering that many of these cases can be handled in other ways, such as through software specifications. This will be a balancing act, but I think there’s a strong community consensus towards reducing the number of ways of capturing atomic entities such as IP Addresses, and as such Trey and I are making this a high priority for CybOX v3.0.

JasonKeirstead commented 9 years ago

I would propose going with extended CIDR notation (127.0.0.1/255.255.255.0) as this allows you to represent non-contiguous IP spaces and is thus more flexible than shorthand notation.

c-x commented 9 years ago

@JasonKeirstead I don't understand.

255.255.255.0 = 24 bits So 127.0.0.1/24 == 127.0.0.1/255.255.255.0

How do you define non contiguous addresses?

JasonKeirstead commented 9 years ago

@c-x Non-contiguous netmasks are not common, but have useful purposes. For example I can make this netmask that matches any IP that ends in 15: 0.0.0.15/0.0.0.255 . A non-contiguous netmask leverages the fact that a netmask is a true bitmask to match IPs that are not contiguous.

c-x commented 9 years ago

That's interesting! Maybe we should finally allow all those notations but requires a field to know which one it is. It's similar to the description of #379

Ex:

    <cybox:Properties xsi:type="IPv4Obj:IPv4ObjectType" format="cidr">
        <IPv4Obj:Value>199.192.156.134/32</IPv4Obj:Value>
    </cybox:Properties>

Where format could be one of: cidr (ex: 32, 24..), single (none, just a single IP), netmak (ex: 255.255.255.0), extended-cidr (ex: 0.0.0.255)

ikiril01 commented 9 years ago

@c-x I can understand the utility of having a field for explicitly specifying IP formatting, though I wonder if it goes too far in the direction of complexity (that we were trying to avoid with #379), as it means that consumers will have to support all possible IP formats that can be specified in this field. That said, having the four formats may not be that big of a burden, and it should still be possible to naturally validate data contained in a such field (e.g., with a regex), even with four different supported formats.

JasonKeirstead commented 9 years ago

I agree with @ikiril01 I would prefer to just specify one and only one format. I don't see why we need to support multiple formats. KISS. Short CIDR or long-form CIDR are probably the two best options.

c-x commented 9 years ago

Just sharing thoughts :)

I like simple things but we also need to keep flexibility. For IPv4, I would vote for only the CIDR notation, that's the most common and it cover all use cases except non continuous range which are very rare in DFIR.

If we allow only one IP format, then to be consistent we need to be strict on all others format (ex: MAC Address are only described with semicolon)

The difficulty today is that all those formats can be described but we don't know which one it is because no field indicate it. In short, think to it as an object from a programming point of view. If obj.format == cidr then ... etc. That's quite simple compared to today where we have to add conditions like if there is a slash present it might be a cidr ... but I need another if to check the '##'...

JasonKeirstead commented 9 years ago

My only other argument for the long form CIDR is there are gong to be rules on Cisco devices that you won't be able to represent in CybOX with that format because they use the non-contiguous ranges.