Cyb3rSn0rlax commented 4 years ago

A small suggestion, if it is aligned with your vision of the project, is to enable people to add bad IPs to there events and modify the event.kind to alert once the bad IP is detected in order to raise it on the SIEM app. This is specially beneficial for when you have multiple fortiXX instances or many other solution you can centralize your blacklist and enrich your logs even further in a nice and easy way. I can make PR if you want.

enotspe commented 4 years ago

Please make that PR!!!!

I had in mind adding this kind of integration later on (very far away), or wait for Elastic to add it. I was thinking on using Minemeld as a Threat Intelligence aggregator, I also noticed that there is a MISP module.

Anyway, I review it a litle bit and found several challenges:

Threat Intel aging: we had a Minemeld instance getting IOCs from around 15 feeds, and the numbers kept growing and growing, after a couple of weeks we had around 500K IOCs. Just to put some perspective, Fortiguard manages less than 100k. I think MISP can provide some value for aging already on the feed. With Minemeld it will have to be set manually (how long is too long?)
Logstash vs Ingest Pipeline: For such a heavy lookup, I am not sure what can perform better, if logstash or ingest pipeline with enrich processors. Maybe it would help some benchmark tests (if somebody can share or find one)
Backwards lookup: with threat intel, you not only want to enrich your current log feed, but also to look backwards if an old logs matches some new IOC, so you can find past traces of malicious activity. This could be done with logstash with the ES plugin, but again, performance might be an issue. Maybe a better way would be with the new async searches.

Let us know what you think and what challenges do you see. Also what strategy do you plan to use.

Again, many thanks for your contributions and ideas

Cyb3rSn0rlax commented 4 years ago

Hello, Sorry for the late response. I was thinking about basically 3 milestones :

[ ] - Short-term Milestone : Make repository user manually add some blacklisting lists whether they are usernames in vacations, bad IPs or malicious domains..etc. The intention here is to make it easy for people to enrich events in real time in correlation with a pre-defined lists (yaml dictionaries in our case). This is something can be added today as a pipeline between geo_enrichment andlogstash_enrichment. Here is an example I am currently using with this project:
```
input {
pipeline {
    address => "blacklist"
}
}
```

filter { if "Firewall" in [observer][type] and "public" in [destination][locality] { translate { field => "[destination][ip]" destination => "BLACKLISTED" dictionary_path => '/home/ubuntu/intel/blacklist/BlackListIP.yml' fallback => "NO" } } }

output {

pipeline {
    send_to => "logstash_enrichment"
}

}

Where BlackListIP.yml is something like :

"IP":"YES for Blacklisted"

"103.129.98.17" : "YES" "103.253.73.77" : "YES" "103.83.81.144" : "YES" "104.18.36.98" : "YES" "107.175.64.210" : "YES" "108.171.216.194" : "YES" "110.4.45.119" : "YES" "184.168.221.43" : "YES" "185.104.45.20" : "YES" "185.174.100.116" : "YES" "185.193.38.74" : "YES" "192.138.20.112" : "YES" "217.174.152.68" : "YES" "50.63.202.57" : "YES" "62.173.145.104" : "YES" "69.167.178.28" : "YES" "79.96.191.147" : "YES" "79.98.28.30" : "YES" "85.93.145.251" : "YES" "91.189.114.7" : "YES" "94.23.64.40" : "YES"

Then we can use some field coloring and even scripted field to automate the analysis even further make extra correlation: 
![image](https://user-images.githubusercontent.com/18106793/82174395-17e13780-98c0-11ea-9b1a-c9b6f5593dad.png)

- [ ] **Mid-term Milestone** : 

Threat aging and Backwards Lookup are really some great challenges due to the design of Elasticsearch that deep down it creates new documents instead of updating them.

For threat aging it really depends on the consumer, their needs of Threat Intel and most importantly the size of their cluster because you can't just keep all the data. Each organization should define their needs in this matter and if we are being real, according to the pyramid of pain, IOCs are one of the most easily changeable data
However, we can handle this challenge by creating dedicated indexes under the ecs-* index with special mapping based on the timestamp of the IOC creation dates and with ILM index life cycle management in a HOT/Warm/cold architecture.

For the Backwards Lookup, I think we can use some fingerprinting with Logstash in order to avoid duplicates and use it to update the documents instead of creating new one since the challenge here is that an IP might be good today but after a while it might be malicious and instead of creating a new document we should update the first one by replacing the id of every ingested threat intel document by its fingerprint. This is something that was used in this project (https://git.deepaknadig.com/deepak/sdn-threat-intelligence/-/tree/master/)

mutate { add_field => { "occurrences" => 1 }

Fingerprinting to remove duplicates

fingerprint { concatenate_sources => true source => ["[threat-data][type1]", "[threat-data][value1]"] target => "[@metadata][fingerprint]" method => "MURMUR3" }

and the output it like this :

output { elasticsearch { hosts => ["localhost:9200"] action => "update" doc_as_upsert => true script => "if (ctx._source.occurrences != null) {ctx._source.occurrences += 1}"

Replace "document_id" with "fingerprint"

document_id => "%{[@metadata][fingerprint]}"
index => "threats"

} }



The other key for this to work properly is to create the threat intel index under ecs index so that we can create scripted fields to correlate IOCs with the observers fields (source.ip, detination.ip, hash value, domain name...), Maybe we can then build our queries based on the date of the ingestion (not the creation of the ioc) and once its old enough with our ILM policy we make it warm than delete. (this theory definitly have some flaws but we should try and test everything).

- [ ] **Long-term Milestone**:

Create a web application (FLASK, python or anything simple) to automate the upload and download and the ingestion of these tasks for SIEM engineers and Threat Intelligence Analysts like for example : 

- Add a list of users on vacation.
- Update the list by deleting or adding a user.
- Add a temporary list of internal IPs of possibly infected workstations.

Basically this application is gonna replace the manual effort of creating yaml dictionaries or cron jobs that download CTI feeds.

Tell me what you think of the validity of these actions

enotspe commented 4 years ago

wow! that looks super interesting. Mid-term solutions seems quite interesting.

Just as a suggestion, I think we should work with the ingest processor instead of logstash dictionaries lookups. That way you could also visualize your threat intel data. Actually, next steps for the project are to move all lookups to enrich processors.

About short-term, as I said, a threat intel database can go easly as big as 100k IPs, so I don´t know how a logstash lookup could impact performace, specially because firewalls logs can get very heavy (if you enable log-all on implicity deny rule for example). You can easly go above 2k EPS. It is worth to try it out though, let´s push the limits until it breaks.

nicpenning commented 4 years ago

I have some general comments/questions:

Does a dictionary file get updated in near real time when changed or does it require a deploy to take affect?

What about SIEM detection rules and tripping on not only the Fortinet logs but rather any log that makes it into Elastic? A simple PowerShell or Python script in conjunction with some webhook functionality should be able to create these rules.

I do like the idea of having an event tagged though because even then you could using the alerting when those tags trip.

Good ideas all around.

enotspe commented 4 years ago

looks like these guys are on step ahead: https://www.youtube.com/watch?v=8yf9DJ_TO6o

spcecilly when taking into consideration scalability. Blacklists can grow a lot, and we all know firewalls generate tons of logs as well.

@nicpenning when you put new data into a dictionary, logs get enriched automatically with it. No need for restarting your logstash service.

enotspe / fortinet-2-elasticsearch

[SUGGESTION] Adding a BLACKLIST pipeline #21

"IP":"YES for Blacklisted"

Fingerprinting to remove duplicates

Replace "document_id" with "fingerprint"