Closed kcreddy closed 6 months ago
Pinging @elastic/security-external-integrations (Team:Security-External Integrations)
Hey @jamiehynds @andrewkroh
Looking at the docs for 4 datastreams we support (url
, malware
, malwarebazaar
, and threatfox
) and also having conversation with folks from AbuseCH/Spamhaus, following is the analysis for each datastream in order to support IOC expiration for AbuseCH:
1. URL
actively distributing malware or that have been added to URLhaus within the past 90 days
is documented: https://urlhaus.abuse.ch/api/#csv. This is recommended by the AbuseCH folks to retrieve only relevant unexpired indicators. The API to use is https://urlhaus.abuse.ch/downloads/csv/
2. Malware (similar to URL case, just the API is different)
all payloads collected by URLhaus, identified by a hash (MD5 / SHA256 hash)
is documented: https://urlhaus.abuse.ch/api/#payloads. This is recommended by the AbuseCH folks to retrieve only relevant unexpired indicators. The API to use is https://urlhaus.abuse.ch/downloads/payloads/
3. Malware Bazaar
query
as get_recent
to retrieve list of malware samples added to MalwareBazaar within the last 60 minutes. It is documented at: https://bazaar.abuse.ch/api/#latest_additions all hashes and are only being generated once per hour
is documented: https://bazaar.abuse.ch/export/#csv. This is recommended by the AbuseCH folks to retrieve only relevant unexpired indicators. The API to use is https://bazaar.abuse.ch/export/csv/full/
4. Threatfox
query
as get_iocs
to retrieve a copy of the current IOC dataset from ThreatFox for a maximum of 7 days. It is documented at: https://threatfox.abuse.ch/api/#recent-iocs all IOCs and are only being generated once per hour
is documented: https://threatfox.abuse.ch/export/#json. This is recommended by the AbuseCH folks to retrieve only relevant unexpired indicators. The API to use is https://threatfox.abuse.ch/export/json/full/
. It is also confirmed that this API provides last 6 month worth of indicators.The transform could then query only these 4 new datastreams as source indices. Since none of them have expiration dates, and we fetch full list of unexpired indicators, the transform retention behaviour could be same as ti_recordedfuture
https://github.com/elastic/integrations/blob/main/packages/ti_recordedfuture/elasticsearch/transform/latest_ioc/transform.yml#L25
Let me know what do you guys think about creating these 4 new datastreams to support IOC expiration and the approach.
Some thoughts on this after speaking in person.
labels.is_ioc_transform_source
? It sounds like only the new data streams would be eligible to be used by the transform, and hence only new ones will receive that label.Hey @andrewkroh
What is the difference in the data format between the currently used API and proposed alternative API/download
diff --git a/Users/kcreddy/Downloads/url-existing.json b/Users/kcreddy/Downloads/url-new.json
index 3b05f48d4..d6f254acf 100644
--- a/Users/kcreddy/Downloads/url-existing.json
+++ b/Users/kcreddy/Downloads/url-new.json
@@ -1,21 +1,17 @@
+"2711415": [
{
- "id": 2711415,
- "urlhaus_reference": "https:\/\/urlhaus.abuse.ch\/url\/2711415\/",
- "url": "http:\/\/23.236.203.81\/gEUBYPspBNL33.bin",
+ "urlhaus_link": "https://urlhaus.abuse.ch/url/2711415/",
+ "url": "http://23.236.203.81/gEUBYPspBNL33.bin",
"url_status": "online",
- "host": "23.236.203.81",
- "date_added": "2023-09-13 07:39:06 UTC",
+ "dateadded": "2023-09-13 07:39:06 UTC",
+ "last_online": "2023-09-13 07:39:06 UTC",
"threat": "malware_download",
- "blacklists": {
- "spamhaus_dbl": "not listed",
- "surbl": "not listed"
- },
"reporter": "abuse_ch",
- "larted": "true",
"tags": [
"encrypted",
"GuLoader",
"rat",
"RemcosRAT"
- ]
-}
\ No newline at end of file
+ ],
+}
+],
The new json has 2 fields missing namely blacklists
and larted
. More details on these fields here: https://urlhaus-api.abuse.ch/#urls-recent
Can we create only a single transform
As per the transform API, the source can be an array of index patterns. So, we can define multiple sources and 1 destination index which contains all active IOCs.
What data streams will be annotated with labels.is_ioc_transform_source?
Only the new source datastreams will be having that label.
Will users get duplicate alerts if they enable both data streams that pull from a given source (i.e. they enable both the existing and new urlhaus data streams)?
If they are using both datastreams (existing and new), we have a way to solve it but need manual setup.
url
(existing) and active_url
(new) datastreams together.@P1llus mentioned another approach: Modifying the existing datastream to have user switch between datasets that supports IOC expiration or not. If user has opted (boolean switch) for allowing IOC expiration, then the new URL will be used.
Proposed steps for this approach:
ioc_expiry
to switch to IOC expiration supported URL. Default could be false
. When ioc_expiry: true
, create field labels.is_ioc_transform_source: true
. {{#if ioc_expiry}}
, add necessary logic to handle new URL/dataset inside input.hbs.yml
file. If httpjson
input cannot handle this, then switch to cel
input."logs-ti_abusech.url-*", "logs-ti_abusech.malware-*", "logs-ti_abusech.malwarebazaar-*", "logs-ti_abusech.threatfox-*"
) but only queries the new datasets that supports IOC expiration.
source:
query:
filter:
- exists:
field: is_ioc_transform_source
This way, the users are forced to only use one of current or new URL depending on their choice of IOC expiration support.
Since the Detection Rules are queried using NOT is_ioc_transform_source: *
, i.e., if the field not exists, they are only either going to query the existing dataset (without IOC expiry support) or new destination indices created by the transform.
I prefer the new approach: https://github.com/elastic/integrations/issues/7322#issuecomment-1717434388 which has same datastreams due to presence of manual setup in previous approach where we have new datastreams created.
@andrewkroh WDYT?
This way, the users are forced to only use one of current or new URL depending on their choice of IOC expiration support.
+1, let's try this.
One lingering question in my mind after looking at the diff in event fields for URLhaus is whether this small delta is justification for keeping support for the old API. Do we think our users will be affected if we make the change to only use the API that support expiration. I'm thinking that as long as indicator match rules continue to work, then users won't care (a loss of fields like larted
won't affect indicator match).
Few things I tried with existing httpjson
input didn't quite work since the response is a compressed zip containing JSON file.
Options tried out with httpjson
:
After decoding as application/zip
, split on "body" field as map.
config:
response.decode_as: application/zip
response.split:
target: body
type: map
keep_parent: false
Error: invalid target: body accessing 'response.split'"
Copy the body into body.data template value as JSON and then split.
response.decode_as: application/zip
response.transforms:
- set:
target: body.data
value: '[[ toJSON .last_response.body ]]'
value_type: json
response.split:
target: body.data
type: map
keep_parent: false
Agent logs show 1 event published: "request finished: 1 events published
, this indicates split not working.
Error inside Elasticsearch logs: failed to execute bulk action create: source[n/a, actual length 32 mb, max length 2 kb].... Fail to parse: limit of 10000 fields has been exceeded while adding new fields [9898]
I would like to use CEL input to achieve this as it seems easier. From a user standpoint, they would have to switch the inputs.
After some tests due to performance limitations experienced by transform, a full download of all indicators for ThreatFox, Malware and MalwareBazaar has to be ruled out. The compressed zip file sizes are > 500MB and only a very bulky agent is able to ingest all indicators. The transform itself is not able to fully move data from source into destination.
So, for these 3 datastreams (ThreatFox, Malware and MalwareBazaar), existing ingest pipelines are modified to incorporate deleted_at
timestamps which takes value from a user-configurable Indicator Expiration Time
. So, the users set when they want indicators to expire default to 90d
.
Individual tracking issue for
ti_abusech
package for adding IOC expiration support. Meta Issue - https://github.com/elastic/integrations/issues/5369