EmersonElectricCo / fsf

File Scanning Framework
Apache License 2.0
285 stars 49 forks source link

META_JAVA_CLASS returns tuples, which are not supported in JSON #40

Closed dcode closed 7 years ago

dcode commented 7 years ago

When dumping JSON info about META_JAVA_CLASS module, you're leveraging python-javatools. here they're dumping the constants_pool as a tuple, which doesn't exist in JSON. The standard way JSON serializers handle this is to create a list, but you can (and do in this case) end up with a list of different types.

Namely, running against a sample gave me the following snippet:

                      "META_JAVA_CLASS": {
                            "implements": [],
                            "name": "a.a.a.K",
                            "fields": [],
                            "platform": "1.5",
                            "constants_pool": [
                                [
                                    1,
                                    "class",
                                    "#32"
                                ],

This tuple starts with an int, followed by two strings. The problem lies when using tools like Elasticsearch which try to analyze the tuple. It cannot treat the values of that list as both a numeric and string type and it blows up. 😢

It's obviously a bit more works on FSF's part, but one way to handle this particular use-case is to remove the integer and preserve order in the constants_pool list, as the int looks like an index, but I could be wrong.

I've also read about some weird approaches of nested dictionaries.

akniffe1 commented 7 years ago

I dug a bit more through python-javatools and saw the the int in question (object 0 in constants_pool) is generated here ultimately in python-javatools:

    def pretty_constants(self):
        """
        the sequence of tuples (index, pretty type, value) of the constant
        pool entries.
        """

        for i in xrange(1, len(self.consts)):
            t, v = self.pretty_const(i)
            if t:
                yield (i, t, v)

The order of that sequence may be significant in some analysis edge cases, so perhaps the way to handle this would be to covert this well defined tuple into a nested json object so the ultimate output would be this:

                      "META_JAVA_CLASS": {
                            "implements": [],
                            "name": "a.a.a.K",
                            "fields": [],
                            "platform": "1.5",
                            "constants_pool": [
                                { "index": 1,
                                   "type": "class",
                                   "value": "#32"}
                                ],

I think to accomplish this, all that is required is this slight tweak to META_JAVA_CLASS:

def META_JAVA_CLASS(s, buff):
   # Function must return a dictionary
   META_DICT = {}

   options = classinfo_options()
   info = unpack_class(buff)
   META_DICT = classinfo.cli_simplify_classinfo(options, info)
   _constants_pool = []
   for x in META_DICT['constants_pool']:
      _constants_pool.append({"index": x[0], "type": x[1], "value": x[2]})
   META_DICT["constants_pool"] = _constants_pool
   return META_DICT

Yields (as snippet):

{
    "implements": [
        "a.a.a.A"
    ], 
    "name": "a", 
    "fields": [], 
    "platform": "1.5", 
    "constants_pool": [
        {
            "index": 1, 
            "type": "class", 
            "value": "#34"
        },

alternatively as a list:

def META_JAVA_CLASS(s, buff):
   # Function must return a dictionary
   META_DICT = {}

   options = classinfo_options()
   info = unpack_class(buff)
   META_DICT = classinfo.cli_simplify_classinfo(options, info)
   _constants_pool = []
   for x in META_DICT['constants_pool']:
      _constants_pool.append([x[1], x[2]])
   META_DICT["constants_pool"] = _constants_pool
   return META_DICT

Yields (as snippet):

{
    "implements": [
        "a.a.a.A"
    ], 
    "name": "a", 
    "fields": [], 
    "platform": "1.5", 
    "constants_pool": [
        [
            "class", 
            "#34"
        ], 
        [
            "class", 
            "#36"
        ],

@dcode would either of these work out for you?

compsecmonkey commented 7 years ago

I agree with the approach of translating it to a dictionary. That is cleaner and will fit inline with making it easy to build JQ filters.

compsecmonkey commented 7 years ago

@akniffe1 do you want to turn you comment into a PR?

compsecmonkey commented 7 years ago

Fixed in PR #41 (e3fadc1)