certtools / ieps

IntelMQ Enhancement Proposals
4 stars 3 forks source link

IEP004: specification of the format and type field #3

Closed ghost closed 2 years ago

ghost commented 3 years ago

As a follow-up to the discussion in the hackathon, we need to define the format of the "format" field.

We have several related fields:

One suggestion in the hackathon was that the type should be a child of the format field. So I start the discussion with two proposals:

A:

{
    "format": "intelmq",
    "version": 1,
    "type": "event",
    ...
}

B:

{
    "format": {
        "name": "intelmq"
        "version": 1,
        "type": "event"
    },
    ...
}

Valid values for the type field are event and report. Also replaces the __type field which we currently have in the payload.

cc @aaronkaplan @adulau @certbe-trey

aaronkaplan commented 3 years ago

If i recall correctly, Trey had a good reason for variant B, right?


Mobile

On 22.04.2021, at 17:33, Wagner @.***> wrote:

 As a follow-up to the discussion in the hackathon, we need to define the format of the "format" field.

We have several related fields:

The format name The format version The type in that version One suggestion in the hackathon was that the type should be a child of the format field. So I start the discussion with two proposals:

A:

{ "format": "intelmq", "version": 1, "type": "event", ... } B:

{ "format": { "name": "intelmq" "version": 1, "type": "event" }, ... } Valid values for the type field are event and report. Also replaces the __type field which we currently have in the payload.

cc @aaronkaplan @adulau @certbe-trey

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost commented 3 years ago

In order to implement this, we need to get to a conclusion on the format

ping @aaronkaplan @adulau @certbe-trey @otmarlendl @pharook @certtools/intelmq-maintainers

adulau commented 3 years ago

I prefer variant A, it's clearer and less nesting. Concerning the meta, I would recommend to make a prefix in the key which can related to the format. To avoid any collision issue, the prefix is nice way.

Here is a proposal:

{
    "format": "ail",
    "version": 1,
    "type": "ail2ail",
    "meta": {
       "ail:uuid": "03c51929-eeab-4d47-9dc0-c667f94c7d2c",
       "ail:uuid_org: "28bc3db3-16da-461c-b20b-b944f4058708",
    }
}

Concerning the full proposal for the payload part, I would propose something simpler (not sure why the payload was in a separated JSON object) :

{
    "format": "ail",
    "version": 1,
    "type": "ail2ail",
    "meta": {
       "ail:uuid": "03c51929-eeab-4d47-9dc0-c667f94c7d2c",
       "ail:uuid_org: "28bc3db3-16da-461c-b20b-b944f4058708"
    },
    "payload": {
        "raw" : "MjhiYzNkYjMtMTZkYS00NjFjLWIyMGItYjk0NGY0MDU4NzA4Cg=="
    }
}

Like that we have 5 required keys format, version, type, meta and payload. The meta keys are just optional and in the payload maybe only raw is the minimal required.

otmarlendl commented 3 years ago

MIME-Types?

aaronkaplan commented 3 years ago

On 27.04.2021, at 17:30, Alexandre Dulaunoy @.***> wrote:

I prefer variant A, it's clearer and less nesting. Concerning the meta, I would recommend to make a prefix in the key which can related to the format. To avoid any collision issue, the prefix is nice way.

Here is a proposal:

{ "format": "ail", "version": 1, "type": "ail2ail", "meta": { "ail:uuid": "03c51929-eeab-4d47-9dc0-c667f94c7d2c", "ail:uuid_org: "28bc3db3-16da-461c-b20b-b944f4058708", } }

works for me. Good idea about the prefix!

Concerning the full proposal for the payload part, I would propose something simpler (not sure why the payload was in a separated JSON object) :

{

"format": "ail" ,

"version": 1 ,

"type": "ail2ail" ,

"meta" : {

"ail:uuid": "03c51929-eeab-4d47-9dc0-c667f94c7d2c" ,

"ail:uuid_org: "28bc3db3-16da-461c-b20b-b944f4058708" }, "payload": { "raw" : "MjhiYzNkYjMtMTZkYS00NjFjLWIyMGItYjk0NGY0MDU4NzA4Cg==" } }

Minor nit-picking wish from my side: place the JSON key "payload" with "data".

Like that we have 5 required keys format, version, type, meta and payload. The meta keys are just optional and in the payload maybe only raw is the minimal required.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

aaronkaplan commented 3 years ago

On 28.04.2021, at 12:35, Otmar Lendl @.***> wrote:

MIME-Types?

Can be part of specifications / standardisation document. However, not part of the content IMHO. MIME-Types are usually set by HTTP servers or similar.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

bernhardreiter commented 3 years ago

Again I don't feel I can comment in the specific proposals, as my understand of what is it for? (use case) feels incomplete.

Maybe general remarks help you:

ghost commented 3 years ago
* For an interchange, the format of a channel would be a meta information about the channel, not about a single message of the channel. So in most cases I know, the info should not be in each message.

We're not talking about the channel, just about the format of the message transmitted over the channel.

* With data formats, I've seen a number of cases where version number sare not used (anymore). So unless their usage is really forced, they can be left out.

The IntelMQ data format was - until now - unversioned, but was already adapted over time in the last years. Once we send messages in this format we will face the issue that not all connected instances are on the same version and the message need to be converted (e.g. because a field was added in a newer version).

* Leaving room for `type` without having at least two types and their contents defined and understood, maybe too much. If those definitions come up, they could be added with the next revision of the whole format. (A parser cannot be written without a specific definition of each type's content anyway.)

Currently IntelMQ understands two types: report and event

aaronkaplan commented 3 years ago

On 29.04.2021, at 10:33, Bernhard E. Reiter @.***> wrote:

Again I don't feel I can comment in the specific proposals, as my understand of what is it for? (use case) feels incomplete.

We had a pretty good discussion on the call last Thursday about the background ideas.

Maybe general remarks help you:

• For an interchange, the format of a channel would be a meta information about the channel, not about a single message of the channel. So in most cases I know, the info should not be in each message. • With data formats, I've seen a number of cases where version number are not used (anymore). So unless their usage is really forced, they can be left out.

Actually when you link multiple IntelMQ instances (or IntelMQ and something else) in a live data flow situation, you might need what we described.

• Leaving room for type without having at least two types and their contents defined and understood, maybe too much, if those types come up, they could be added with the next revision of the whole format. (A parser cannot be written without a specific definition of each type's content anyway.)

See the ideas of MISP references. It's extensible I think.

So, IMHO - all of this does make sense, the background as you mentioned above, is that we would like to link multiple IntelMQ instances and have a flow. Flows may be interrupted and must resume at any moment. So, that's the context. You will need the header info at least periodically therefore.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost commented 3 years ago

On 29.04.2021, at 10:33, Bernhard E. Reiter @.***> wrote: Again I don't feel I can comment in the specific proposals, as my understand of what is it for? (use case) feels incomplete. We had a pretty good discussion on the call last Thursday about the background ideas.

I agree with @bernhardreiter that it's hard or impossible to comment on the proposals if the intention is unknown.

pharook commented 3 years ago

On 28.04.2021, at 12:35, Otmar Lendl @.***> wrote: MIME-Types? Can be part of specifications / standardisation document. However, not part of the content IMHO. MIME-Types are usually set by HTTP servers or similar. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

We have found mime types quite handy for "opaque" data, like attachments, original contents, and so on. In Idea we have (along with Content itself) ContentType - its quite useful to know at least whether base64'd content is test/plain, application/octet-stream, or even text/csv or application/json. :) We also have optional ContentCharset for text attachments.

otmarlendl commented 3 years ago

On 28.04.2021, at 12:35, Otmar Lendl @.***> wrote: MIME-Types? Can be part of specifications / standardisation document. However, not part of the content IMHO. MIME-Types are usually set by HTTP servers or similar.

Aaron,

what we are basically doing is prefixing the actual content ("payload", "body", .. whatever) with metadata that describes what the following bunch of bytes is supposed to be (and thus how to parse and interpret them).

That problem is not new. It appears in a lot of protocols. You need at least three things to accomplish this:

1) A framing: what is the meta-data, and when does the content actually start 2) A format for the meta-data: how are individual bits of meta-data represented on the wire 3) A taxonomy (incl. a registry for values) describing possible types of payload and names for them

The solution for email and http is: 1) \r\n\r\n is the separator 2) a list of header looking like "Tag: value\r\n" with some predefined meanings for tag values 3) https://www.iana.org/assignments/media-types/media-types.xhtml

I'm not suggesting that we copy 1) and 2) from Mail&HTTP, but for 3), I don't think we should reject out of hand to re-use the established media-types registry.

For example, there is already a Media-Type for STIX (https://www.iana.org/assignments/media-types/application/stix+json), so if IntelMQ 5.2 starts to support STIX, then we have the appropriate value. It also helps if we ever move to other transport protocols.

It isn't terribly important, I'm just wondering why we are re-inventing the wheel.

aaronkaplan commented 3 years ago

Otmar,

Yes, I think where you went off in another direction is that we were talking about the content, while you were talking about the header.

But whatever... as said - not that important. Can be sent as part of the emitter (rabbitMQ, http server, ...) or in the header of the new IDF format.

What is important is that we quickly have a definition. I will need that also for another project :)

On 29.04.2021, at 17:12, Otmar Lendl @.***> wrote:

On 28.04.2021, at 12:35, Otmar Lendl @.***> wrote: MIME-Types? Can be part of specifications / standardisation document. However, not part of the content IMHO. MIME-Types are usually set by HTTP servers or similar.

Aaron,

what we are basically doing is prefixing the actual content ("payload", "body", .. whatever) with metadata that describes what the following bunch of bytes is supposed to be (and thus how to parse and interpret them).

That problem is not new. It appears in a lot of protocols. You need at least three things to accomplish this:

• A framing: what is the meta-data, and when does the content actually start • A format for the meta-data: how are individual bits of meta-data represented on the wire • A taxonomy (incl. a registry for values) describing possible types of payload and names for them The solution for email and http is:

• \r\n\r\n is the separator • a list of header looking like "Tag: value\r\n" with some predefined meanings for tag values • https://www.iana.org/assignments/media-types/media-types.xhtml I'm not suggesting that we copy 1) and 2) from Mail&HTTP, but for 3), I don't think we should reject out of hand to re-use the established media-types registry.

No point in explaining HTTP to me or that it makes sense to use an existing MIME type from IANA. It does :) But that's quite obvious. However, I think you were putting MIME types (usually a header thing) into the discussion of content. Where they don't belong. Anyways... as said not that important.

For example, there is already a Media-Type for STIX (https://www.iana.org/assignments/media-types/application/stix+json), so if IntelMQ 5.2 starts to support STIX, then we have the appropriate value. It also helps if we ever move to other transport protocols.

waldbauer-certat commented 3 years ago

As @wagner-certat already written, I'd go for B.

Nesting offers us to avoid possible name collisions ( sure you could use "format_name": "intelmq" instead of "format": { "name": "intelmq", but IMHO nesting looks cleaner ). Furthermore it also offers us a great update possibility for further changes on intelmq protocol.

pharook commented 3 years ago

Is such granularity for versioning needed? I mean - cannot format and type be just merged into "intelmq-event" and "intelmq-report"? Also, cannot version be part of the name ("intelmq-event-01.00")? I mean, what it will be used for? If I know that I support only specific format and specific version, I'll need to check just for that one string (or even prefix, if version is lexicographically stable). Again - I tend to dislike bloating the format by making things too structured.

ghost commented 3 years ago

Is such granularity for versioning needed? I mean - cannot format and type be just merged into "intelmq-event" and "intelmq-report"? Also, cannot version be part of the name ("intelmq-event-01.00")? I mean, what it will be used for? If I know that I support only specific format and specific version, I'll need to check just for that one string (or even prefix, if version is lexicographically stable). Again - I tend to dislike bloating the format by making things too structured.

@adulau You also have experience working with different formats and networking between applications. Do you have an opinion or recommendation?

aaronkaplan commented 3 years ago

Trying to summarize:

adulau commented 3 years ago

FYI, AIL format description to be used for the new AIL sync. It's inline with the above discussions.

ghost commented 3 years ago

Thanks, looks good from our side! We'd prefer to keep the same format in IntelMQ, as the requirements are the same. For the inter-message linking, we need an extension of course.

aaronkaplan commented 2 years ago

This is solved after the discussion / call on the 25th of June 2022. Defined in all clarity as a JSON schema in a0c00a561c5d916fd9bcd3b8423a217a00ef1d7d