elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
23 stars 435 forks source link

[TI_AbuseCH] Add support for IOC expiration #7322

Closed kcreddy closed 6 months ago

kcreddy commented 1 year ago

Individual tracking issue for ti_abusech package for adding IOC expiration support. Meta Issue - https://github.com/elastic/integrations/issues/5369

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

kcreddy commented 1 year ago

Hey @jamiehynds @andrewkroh

Looking at the docs for 4 datastreams we support (url, malware, malwarebazaar, and threatfox) and also having conversation with folks from AbuseCH/Spamhaus, following is the analysis for each datastream in order to support IOC expiration for AbuseCH:

1. URL

2. Malware (similar to URL case, just the API is different)

3. Malware Bazaar

4. Threatfox

The transform could then query only these 4 new datastreams as source indices. Since none of them have expiration dates, and we fetch full list of unexpired indicators, the transform retention behaviour could be same as ti_recordedfuture https://github.com/elastic/integrations/blob/main/packages/ti_recordedfuture/elasticsearch/transform/latest_ioc/transform.yml#L25

Let me know what do you guys think about creating these 4 new datastreams to support IOC expiration and the approach.

andrewkroh commented 1 year ago

Some thoughts on this after speaking in person.

kcreddy commented 1 year ago

Hey @andrewkroh

What is the difference in the data format between the currently used API and proposed alternative API/download

diff --git a/Users/kcreddy/Downloads/url-existing.json b/Users/kcreddy/Downloads/url-new.json
index 3b05f48d4..d6f254acf 100644
--- a/Users/kcreddy/Downloads/url-existing.json
+++ b/Users/kcreddy/Downloads/url-new.json
@@ -1,21 +1,17 @@
+"2711415": [
 {
-    "id": 2711415,
-    "urlhaus_reference": "https:\/\/urlhaus.abuse.ch\/url\/2711415\/",
-    "url": "http:\/\/23.236.203.81\/gEUBYPspBNL33.bin",
+    "urlhaus_link": "https://urlhaus.abuse.ch/url/2711415/",
+    "url": "http://23.236.203.81/gEUBYPspBNL33.bin",
     "url_status": "online",
-    "host": "23.236.203.81",
-    "date_added": "2023-09-13 07:39:06 UTC",
+    "dateadded": "2023-09-13 07:39:06 UTC",
+    "last_online": "2023-09-13 07:39:06 UTC",
     "threat": "malware_download",
-    "blacklists": {
-        "spamhaus_dbl": "not listed",
-        "surbl": "not listed"
-    },
     "reporter": "abuse_ch",
-    "larted": "true",
     "tags": [
         "encrypted",
         "GuLoader",
         "rat",
         "RemcosRAT"
-    ]
-}
\ No newline at end of file
+    ],
+}
+],

The new json has 2 fields missing namely blacklists and larted. More details on these fields here: https://urlhaus-api.abuse.ch/#urls-recent

Can we create only a single transform

As per the transform API, the source can be an array of index patterns. So, we can define multiple sources and 1 destination index which contains all active IOCs.

What data streams will be annotated with labels.is_ioc_transform_source?

Only the new source datastreams will be having that label.

Will users get duplicate alerts if they enable both data streams that pull from a given source (i.e. they enable both the existing and new urlhaus data streams)?

If they are using both datastreams (existing and new), we have a way to solve it but need manual setup.

kcreddy commented 1 year ago

@P1llus mentioned another approach: Modifying the existing datastream to have user switch between datasets that supports IOC expiration or not. If user has opted (boolean switch) for allowing IOC expiration, then the new URL will be used.

Proposed steps for this approach:

This way, the users are forced to only use one of current or new URL depending on their choice of IOC expiration support. Since the Detection Rules are queried using NOT is_ioc_transform_source: *, i.e., if the field not exists, they are only either going to query the existing dataset (without IOC expiry support) or new destination indices created by the transform.

kcreddy commented 1 year ago

I prefer the new approach: https://github.com/elastic/integrations/issues/7322#issuecomment-1717434388 which has same datastreams due to presence of manual setup in previous approach where we have new datastreams created.

@andrewkroh WDYT?

andrewkroh commented 1 year ago

This way, the users are forced to only use one of current or new URL depending on their choice of IOC expiration support.

+1, let's try this.

andrewkroh commented 1 year ago

One lingering question in my mind after looking at the diff in event fields for URLhaus is whether this small delta is justification for keeping support for the old API. Do we think our users will be affected if we make the change to only use the API that support expiration. I'm thinking that as long as indicator match rules continue to work, then users won't care (a loss of fields like larted won't affect indicator match).

kcreddy commented 11 months ago

Few things I tried with existing httpjson input didn't quite work since the response is a compressed zip containing JSON file.

Options tried out with httpjson:

  1. After decoding as application/zip, split on "body" field as map. config:

    response.decode_as: application/zip
    response.split:
    target: body
    type: map
    keep_parent: false

    Error: invalid target: body accessing 'response.split'"

  2. Copy the body into body.data template value as JSON and then split.

    response.decode_as: application/zip
    response.transforms:
    - set:
    target: body.data
    value: '[[ toJSON .last_response.body ]]'
    value_type: json
    response.split:
    target: body.data
    type: map
    keep_parent: false

    Agent logs show 1 event published: "request finished: 1 events published, this indicates split not working. Error inside Elasticsearch logs: failed to execute bulk action create: source[n/a, actual length 32 mb, max length 2 kb].... Fail to parse: limit of 10000 fields has been exceeded while adding new fields [9898]

I would like to use CEL input to achieve this as it seems easier. From a user standpoint, they would have to switch the inputs.

kcreddy commented 7 months ago

After some tests due to performance limitations experienced by transform, a full download of all indicators for ThreatFox, Malware and MalwareBazaar has to be ruled out. The compressed zip file sizes are > 500MB and only a very bulky agent is able to ingest all indicators. The transform itself is not able to fully move data from source into destination.

So, for these 3 datastreams (ThreatFox, Malware and MalwareBazaar), existing ingest pipelines are modified to incorporate deleted_at timestamps which takes value from a user-configurable Indicator Expiration Time. So, the users set when they want indicators to expire default to 90d.