MISP / PyMISP

Python library using the MISP Rest API
Other
444 stars 279 forks source link

complex_query using and_parameters does not return expected result #719

Open afreudenreich opened 3 years ago

afreudenreich commented 3 years ago

Hi,

I have tried several queries with two tags. In separate queries (search_index with one tag) the search with the first tag returns 321 events and with the second tag 27 events.

I would assume AND-ing these together into one complex_query as below returns at most 27 events, but it returns 321 events:

tags = ['iss-compliance:true', 'source:tls_stats']
complex_query = misp.build_complex_query(
    and_parameters=tags
)
search_params = {
    'tags': complex_query,
}
filtered_events = misp.search_index(**search_params)

What am I missing?

Thanks!

chrisinmtown commented 3 years ago

I'm guessing the list of tags is treated as an implicit "OR" -- find events with any of the tags in the list. In other words, the outer AND does not propagate into the list. Maybe try building a complex query with two lists, each containing just one tag, "AND" those together?

afreudenreich commented 3 years ago

Thanks for the quick reply Chris.

It is hard to see from the code comments how build_complex_query is supposed to behave. You can provide and_parameters, or_parameters or not_parameters, but it is not obvious how those are combined with each other or other parameters.

Regarding your suggestion, how would I construct the query? I can only provide one "and_parameters", so how can I "AND" two tag lists together?

I guess I can combine any parameter with others (uses "AND" implicitly), but I cannot figure out how to "AND" two event tags together.

Today I tried following variations:

complex_query = misp.build_complex_query(
    and_parameters=['iss_compliance:false', 'source:tls_stats']
)
complex_query = misp.build_complex_query(
    or_parameters=['iss_compliance:false', 'source:tls_stats']
)
complex_query = misp.build_complex_query(
    not_parameters=['iss_compliance:false', 'source:tls_stats']
)

They all gave the same result - 27 events, which does not make sense to me (it seems they are always "AND"ed together). I verified the actual query in debug mode and it consisted of three different dicts with the right AND/OR/NOT key, respectively. I might have messed up my query yesterday - so please ignore the previous results.

I know how to work around this issue (using e.g. the "all" parameter combined with a "tag" parameter), but I would like to understand how to use build_complex_query for more advance use-cases. If anyone can provide one example for a working "tag1 AND tag2" query that would be great...

Thanks!

chrisinmtown commented 3 years ago

The PyMISP test suite runs some searches using the build_complex_query method with or_parameters & not_parameters but not a single one with and_parameters. In my book if a feature hasn't been tested, it's broken. This might fall into that category.

~/git/github/MISP.PyMISP% find . -type f -print0 | xargs -0 grep -i build_complex_query | grep -i and

~/git/github/MISP.PyMISP% find . -type f -print0 | xargs -0 grep -i build_complex_query | grep -i or
./tests/testlive_comprehensive.py:            attributes_types_search = self.admin_misp_connector.build_complex_query(or_parameters=['ip-src', 'ip-dst'])
.. many more deleted..
afreudenreich commented 3 years ago

Thanks for this Chris - looking at the test cases helped me understand things a bit more:

What does not work IMO is following (searching the index in each case):

complex_query = misp.build_complex_query(
    or_parameters=['iss_compliance:false'],
)
# returns 27 events
complex_query = misp.build_complex_query(
    or_parameters=['source:tls_stats'],
)
# returns 321 events
complex_query = misp.build_complex_query(
    or_parameters=['iss_compliance:false', 'source:tls_stats'],
)
# returns 27 events

For the last case I would either expect 321 events (if tags in list are implicitly "OR"ed) or 25 events (tags in list are implicitly "AND"ed - two events with 'iss_compliance:false' do not have the 'source:tls_stats' tag). Instead it always returns 27 events - no matter how the tags are ordered.

In the GUI it's also confusing:

Choosing 'source:tls_stats' OR 'iss_compliance:false' - gives me 321 events (correct). Using this URL: https:///events/index/searchtag:94|90

Choosing 'iss_compliance:false' NOT 'source:tls_stats' - gives me 2 events. It shows as "iss_compliance:false' AND NOT source:tls_stats" (correct, as it is "AND"ed). But it is using this URL: https:///events/index/searchtag:94|!90 Which I would read as "iss_compliance:false' OR NOT source:tls_stats".

In summary: I feel partially enlightened, but at the same time very confused. Any further clarifications welcome :)

Thanks!

iglocska commented 3 years ago

How it works internally:

We have 2 (in some cases 3) buckets of parameters for each keyword:

Each of these 3 buckets get added to the filter conditions via an AND operator

The result is something like this for the above cases:

Choosing 'source:tls_stats' OR 'iss_compliance:false'

"conditions": {
   "AND": {
      "OR": [
          "source:tls_stats",
          "iss_compliance:false"
      ]
   }
}

Choosing 'iss_compliance:false' NOT 'source:tls_stats'

"conditions": {
   "AND": {
      "OR": [
          "iss_compliance:false"
      ],
      "AND": [
          {
              "NOT": "source:tls_stats"
          }
      ]
   }
}

Basically AND in itself as a concept did not exist initially as a 3rd bucket in MISP and was added later on to certain parameters via restsearch. The index's URL parameter representation is a simplified version of adding these parameters with that initial limitation in mind, any value can be either a positive lookup (goes in the OR bucket) or a negative lookup (goes in the NOT bucket) and each parameter is separated by a pipe.

As for why we chose a pipe back in the day: When the parameters were first added to the index filtering, we literally only had the positive lookups. Meaning we could add several values per keyword to search on, but all of the lookups were inclusive filters, meaning they were all ORd.

Hope this helps clear it up!

afreudenreich commented 3 years ago

Thanks Andras - that is helpful as well. I guess you are mostly describing what happens under the hood when using the GUI?

My specific problem is still following, when using the python library: how can I do a query with two event tags combined by AND? I assume I would need a complex query and that is where I have most issues:

A) When I search the index directly with an array of tags, they are being "OR"ed together:

search_params = {
    'tags': ['mozilla_compliance:false', 'iss_compliance:false']
}

When I use a complex search instead I would expect following to give the same result:

complex_query = misp.build_complex_query(
    or_parameters=['mozilla_compliance:false', 'iss_compliance:false']
)
search_params = {
    'tags': complex_query
}

But the result is different (it only returns the events from one parameter).

B) Running 'iss_compliance:false' AND NOT 'mozilla_compliance:false' in the GUI returns 1018 events. But running following with several variations combining or_parameters, and_parameters and not_parameters does never return 1018 events, but just the events from parameter 'iss_compliance:false' (2518):

complex_query = misp.build_complex_query(
    or_parameters=['iss_compliance:false'],
    not_parameters=['mozilla_compliance:false'],
)
search_params = {
    'tags': complex_query
}

And as mentioned before, I am not able to put together an "AND" search for two event tags in code. I could work around this by using a NOT and subtracting from the total of the other parameter, but as discussed above that does not work either for me.

In summary I wonder if there is anything wrong with build_complex_query or if it is just my code...

Rafiot commented 3 years ago

Okay, so the problem is related to the complex query builder, because it only generates a "flat" dictionary: one key AND, OR, NOT, at the same level in the dict, so that's definitely not going to work.

I started to change it, but it barf in interesting ways:

{
  "AND": {
    "OR": [
      "tlp:white___test"
    ],
    "AND": [
      {
        "NOT": [
          "tlp:amber___test",
          "foo_double___test"
        ]
      }
    ]
  }
}
[main] - [internal], line ??</pre></div></pre>{                                                         
    "code": 500,
    "name": "SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;= (&#039;tlp:white___test&#039;)) OR (`Tag`.`name` like = (Array)))&#039; at line 1",
    "message": "SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near &#039;= (&#039;tlp:white___test&#039;)) OR (`Tag`.`name` like = (Array)))&#039; at line 1",
    "url": "\/events\/restSearch",                                             
    "error": {                                                                 
        "errorInfo": [                                                         
            "42000",                                                           
            1064,                                                              
            "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '= ('tlp:white___test')) OR (`Tag`.`name` like = (Array)))' at line 1"
        ],                                                                     
        "queryString": "SELECT `Tag`.`id` FROM `misp`.`tags` AS `Tag` LEFT JOIN `misp`.`organisations` AS `Organisation` ON (`Tag`.`org_id` = `Organisation`.`id`) LEFT JOIN `misp`.`users` AS `User` ON (`Tag`.`user_id` = `User`.`id`)  WHERE ((`Tag`.`name` like = ('tlp:white___test')) OR (`Tag`.`name` like = (Array))
)"                                                                             
    }                                                                          
}
chrisinmtown commented 3 years ago

I would like to propose an improvement to the current single sentence in the docstring (seriously guys, what were you thinking). But I don't grasp this yet. Does complex_query search the events metadata only (basically the events index)? Does it assume tags? How do you specify for complex_query which field should be searched; for example, if I want to search the event's info field or the orgc field? Just to make sure, can complex_query match attribute values on an event?

Rafiot commented 3 years ago

No rush in updating the documentation, I don't think anyone can use this approach to query MISP unless they're using the UI, and we're going to find exceptions like that for the next few days because it's not properly tested at this point.

AFAIK, the AND/OR/NOT system is generic and can be used on most (if not all) settings (org, tag, type, ...), but as you can see, creating something that work is not trivial (again, unless you use the UI in the REST client on MISP).

chrisinmtown commented 3 years ago

Well without documentation I also don't think anyone can use this approach. :/ post here if you would like me to contribute a docstring for complex_query

github-germ commented 2 years ago

Was there any resolution? Thanks.

afreudenreich commented 2 years ago

Thanks everyone for your help. Just FYI: as I could not reproduce consistent behaviour with the complex search (and I still do not fully understand the behaviour under the hood) I have worked around this issue in my code by not using a complex search. If you have changed the behaviour (and documentation) of the complex search in a recent version and want me to do some testing, please let me know. Otherwise feel free to close this issue.