PaloAltoNetworks / minemeld-core

Engine of MineMeld
Apache License 2.0
141 stars 95 forks source link

Fix/taxii 2.1 pagination #400

Open UFOSmuggler opened 1 year ago

UFOSmuggler commented 1 year ago

Description

Fix pagination in TAXII 2.1 requests

Motivation and Context

Minemeld attempts to paginate TAXII 2.1 requests by using the most recent value of the "modified" keys of STIX objects within the response, to form new "added_after" URI parameters for subsequent queries.

This is an invalid approach to pagination in TAXII 2.1. The TAXII 2.1 spec has this to say about the "added_after" URI parameter:

The added_after parameter is not in any way related to dates or times in a STIX object or any other CTI object.

Using this method is inappropriate and may lead to pagination loops, resulting in Minemeld polling rapidly forever.

Here is an example found in the wild:

$ curl -s -H 'Accept: application/taxii+json;version=2.1' 'https://redacted-taxii2-service/api-root/collections/redacted-collection-uuid/objects/?added_after=2022-09-21T01:16:11.337106Z&limit=100'|grep -oP '"modified": ".+?"'|sort -n|tail -n 3
"modified": "2022-09-21T01:16:11.336277Z"
"modified": "2022-09-21T01:16:11.336554Z"
"modified": "2022-09-21T01:16:11.337106Z"

As can be seen, the most recent modified date is the same as the one we just requested.

The valid method for pagination by using "added_after" is to use the "X-TAXII-Date-Added-Last" HTTP response header which contains the datetime the most recent object in the response was added to the collection. This contains a timestamp of the same format as expected by "added_after" so it is a trivial change to alter this code to work with the value of this header.

However, it is likely more appropriate to use the "next" key in the TAXII response, as the server will take care of pagination for the client. Taking the value of the "next" key, and using it in a request with the "next" URI parameter will obtain the next page.

I have altered the taxii2.py "_poll_taxii21_server" function to first attempt to use the "next" key, and fall back to the "X-TAXII-Date-Added-Last" HTTP response header. If neither of these are available, we raise an error as the TAXII 2.1 server is not meeting the specification.

How Has This Been Tested?

I tested this fix against Medallion 3 and OpenTAXII 0.9.3 TAXII 2.1 collections which the original code could not successfully poll without looping. I was able to fully poll all collections.

Types of changes

Checklist

welcome-to-palo-alto-networks[bot] commented 1 year ago

:tada: Thanks for opening this pull request! We really appreciate contributors like you! :raised_hands: