Northern-Lights / yara-parser

Tools for parsing rulesets using the exact grammar as YARA. Written in Go.
MIT License
80 stars 9 forks source link

Single JSON Output Schema #17

Closed utkonos closed 4 months ago

utkonos commented 5 years ago

Greetings friends,

I'm one of the maintainers of plyara: https://github.com/plyara/plyara

It looks like we've arrived at many of the same conclusions and are doing very similar things at least in the parsing and JSON output departments. Our project just released a 2.0.0 version yesterday, and the announcement sparked a discussion on Twitter here: https://twitter.com/MalwareUtkonos/status/1091533281244471297

Eventually, the author of YARA joined the conversation and mentioned that they are working on a Go implementation of the YARA parser. This implementation will output rules in JSON format. This may or may not obsolete parts of your project, but that's a side topic.

My main proposal is: let's coordinate on one single schema for data structure and JSON output format. We can definitely have local variation, but I think having a single schema that is interoperable among all three projects is a good thing. As a first step, I can post an annotated copy of our full JSON schema along with the reasoning behind various decisions. The short term goal would be to have both your and our annotated schema sent over to the core YARA developers. An ideal situation would be that they adopt as much of our "unified" schema as makes sense. They would then release the official schema when ready. We would then produce JSON that conforms to that official schema. If there are fields that we can't all agree on, we would then have a flag to enable additional local/optional fields in our output.

Please let me know your thoughts on this proposal.

Here is our open issue on the same subject: https://github.com/plyara/plyara/issues/50

Northern-Lights commented 5 years ago

Hi utkonos, thanks for looping me in. I apologize for having missed this.

I would definitely be onboard with using a common serialized format. Sometime back, I had created a protocol buffer file to express the data format that I am using: https://github.com/Northern-Lights/yara-parser/blob/master/data/data.proto

I am willing to change the data format since I have not yet reached 1.0.0. I would like to use something like the PB IDL so that users have a reference of the exact types of data to expect, so that they can use any language of their choice to generate the data structures, and so that it will also be JSON-compatible.

I'd like to know what you think.