Mordor Data goes to indexme-*

tschohanna commented 3 years ago

Describe the problem

When trying to ingest data from the mordor dataset into HELK with kafkacat, all the data goes into the indexme- index pattern and not into the actual logs- index pattern.

Provide the output of the following commands

# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

# echo -e "\nDocker Space:" && df -h /var/lib/docker; echo -e "\nMemory:" && free -g; echo -e "\nCores:" && getconf _NPROCESSORS_ONLN

Docker Space:
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv   37G   17G   18G  49% /

Memory:
              total        used        free      shared  buff/cache   available
Mem:              7           6           0           0           1           0
Swap:             3           0           3

Cores:
4

# docker ps --filter "name=helk"
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED      STATUS       PORTS                                                                                                                                                                                                  NAMES
ae0c84a466b5   confluentinc/cp-ksql-server:5.1.3                     "/etc/confluent/dock…"   5 days ago   Up 3 hours   0.0.0.0:8088->8088/tcp                                                                                                                                                                                 helk-ksql-server
a48745daf621   otrf/helk-kafka-broker:2.4.0                          "./kafka-entrypoint.…"   5 days ago   Up 3 hours   0.0.0.0:9092->9092/tcp                                                                                                                                                                                 helk-kafka-broker
030fd322036d   otrf/helk-zookeeper:2.4.0                             "./zookeeper-entrypo…"   5 days ago   Up 3 hours   2181/tcp, 2888/tcp, 3888/tcp                                                                                                                                                                           helk-zookeeper
bd9341225ebb   otrf/helk-elastalert:latest                           "./elastalert-entryp…"   5 days ago   Up 3 hours                                                                                                                                                                                                          helk-elastalert
85da22afdb6e   otrf/helk-logstash:7.6.2.1                            "/usr/share/logstash…"   5 days ago   Up 3 hours   0.0.0.0:3515->3515/tcp, 0.0.0.0:5044->5044/tcp, 0.0.0.0:5514->5514/tcp, 0.0.0.0:5514->5514/udp, 0.0.0.0:8515-8516->8515-8516/tcp, 0.0.0.0:8531->8531/tcp, 0.0.0.0:8515-8516->8515-8516/udp, 9600/tcp   helk-logstash
49bf3e249386   otrf/helk-nginx:0.3.0                                 "/opt/helk/scripts/n…"   5 days ago   Up 3 hours   0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp                                                                                                                                                               helk-nginx
4efbfc77a1c1   docker.elastic.co/kibana/kibana:7.6.2                 "/usr/share/kibana/s…"   5 days ago   Up 3 hours   5601/tcp                                                                                                                                                                                               helk-kibana
ad51907feaee   docker.elastic.co/elasticsearch/elasticsearch:7.6.2   "/usr/share/elastics…"   5 days ago   Up 3 hours   9200/tcp, 9300/tcp                                                                                                                                                                                     helk-elasticsearch

What version of HELK are you using

# git log -1 --oneline
b40f92f (HEAD -> master, origin/master, origin/HEAD) Update kibana.md

What steps did you take trying to fix the issue

I tried to ingest the data with the winevent topic, because the winlogbeat topic did not work, but the results were the same. I do not have a lot of knowledge about Kafka and Logstash, where I assume the issue is, but I tried to analyse the Logstash configurations and found nothing that I could do to fix the issue.

How could we replicate the issue

Install HELK with option 3 of the install script. Download the mordor dataset. Install kafkacat. Try to ingest the data in the two following ways:

# kafkacat -b localhost:9092 -t winlogbeat -P -l empire_mimikatz_logonpasswords_2020-08-07103224.json
# kafkacat -b localhost:9092 -t winevent-P -l empire_mimikatz_logonpasswords_2020-08-07103224.json

priamai commented 3 years ago

Yes this functionality seems to be broken, let me elaborate on that: 1) in the manual it is suggested that kafkacat should pulish to the mordor topic 2) Logstash Kafka input: topics => ["winlogbeat","winevent","SYSMON_JOIN","filebeat"] the mordor topic is not present 3) There is a mordor logstash pipeline input here (port 3515): https://github.com/Cyb3rWard0g/HELK/tree/master/docker/helk-logstash/mordor_pipeline which then sends to Kafka topic: winevent

So we should try that: send the json file to that 3515 port and see what happens.

I agree though there must be a clear path via kafka, let me think about how can we resolve that.

priamai commented 3 years ago

By the way I am also wondering whether they have used NXLog: https://nxlog.co/products/nxlog-community-edition for doing the windows log collection this could have some implications on the fields structure.

priamai commented 3 years ago

I also found another confusing instructions in Mordor-Elastic. There is a logstash output option:

!python3 Mordor-Elastic.py --no-index-creation --output logstash --url logstash-ip:3515 events.tar.gz

Inside the code:

r = requests.post(logstash_url, json=event, verify=verify_certs)

which performs an http request but we need to perform a plain TCP request as per my previous notes.

I believe the logstash output is currently broken (the elasticsearch one on the other side I was able to perform). I will attempt to write a new script.

priamai commented 3 years ago

I found a way will post the instructions here shortly!

priamai commented 3 years ago

Okay I sang victor too early. I did manage to send the mordor json files to the logstash pipeline that then is fed into Kafka. Code looks like this:

import socket
import sys

HOST = "helk-logstash"
PORT = 3515

def get_constants(prefix):
    """Create a dictionary mapping socket module
    constants to their names.
    """
    return {
        getattr(socket, n): n
        for n in dir(socket)
        if n.startswith(prefix)
    }

families = get_constants('AF_')
types = get_constants('SOCK_')
protocols = get_constants('IPPROTO_')

# Create a TCP/IP socket
sock = socket.create_connection((HOST, PORT))

print('Family  :', families[sock.family])
print('Type    :', types[sock.type])
print('Protocol:', protocols[sock.proto])
print()

try:
    with open('test_log.json','r') as file:
        json_lines = file.read()
        tot_lines = json_lines.count( "\n" ) + 1
        print('Loading ... %d events' % tot_lines)
        sock.sendall(json_lines.encode())
except socket.timeout as e:
    print(e)
except socket.error as e:
    # Something else happened, handle error, exit, etc.
    print(e)
finally:
    print('closing socket')
    sock.close()

However the events get dropped because the mordoer json files are just EVTX converted to JSON. Basically that pipeline has the purpose to process NXLOG formatted events.

Currently I do not see a way to send the JSON events to the correct processing pipeline simply because there is no pipeline to process them.

What I ended up doing is to load directly into ES index with this command:

!python3 Mordor-Elastic.py --no-index-creation --output elasticsearch --url helk-elasticsearch events.tar.gz

which saves the documents inside winlogbeat-mordor.

neu5ron commented 3 years ago

yes I think the functionality may be an issue - @Cyb3rWard0g any idea? Sorry I hadn't really messed w/ all the mordor pipeline since it got split into logstash "pipelines". But let me know how I can help.

@priamai @tschohanna if you output to port 8531 tcp does that change anything? https://github.com/Cyb3rWard0g/HELK/blob/master/docker/helk-logstash/pipeline/0005-nxlog-winevent-syslog-tcp-input.conf

nxlog or not - HELK handles winlogbeat and nxlog.

priamai commented 3 years ago

Testing your suggestion now. Nevertheless the major problem we are facing is the lack of the original .EVTX files that would allow a full replay via winbeat and trigger the full Logstash+Kafka pipeline. Yes the size will be bigger but it will allow a full end to end testing.

priamai commented 3 years ago

@neu5ron yes I have pushed the events into the 8531 logstash input but they endup in the indexme index. I also visually inspected the logstash pipeline via Kibana and can see that :

By the way the pipeline is a massive if-then-else switch and is very hard to debug when the parse failure happened. I am also not familiar with NXLOG format and I don't know whether there are some extra field which we need to artificially add to the JSON files?

neu5ron commented 3 years ago

OK, this should be easy to fix.

Well it's parsing so yeah its a lot.. but each and every single filter it hits is added as a tag so you can see exactly where everything hits. If you have suggestions would appreciate them.

priamai commented 3 years ago

Good point on the tags this is what I see:

    "etl_pipeline": [
      "all-filter-0098",
      "all-add_processed_timestamp",
      "fingerprint-0099-002",
      "json-0301-001"
    ],

Therefore I am guessing this is the last stage the event pass through: https://github.com/Cyb3rWard0g/HELK/blob/master/docker/helk-logstash/pipeline/0301-nxlog-winevent-to-json-filter.conf

For the logstash structure I found this structure better to maintain:

https://github.com/enotspe/fortinet-2-elasticsearch/blob/master/logstash/10-input_syslog_fortinet

they define a pattern in each configuration file input-filter-output with the pipeline construct. Will be quite a big task to convert the existing one to that format for sure.

neu5ron commented 3 years ago

@priamai try now w/ updated branch. there was incorrect removal of type that then knocked most of flow off.

thanks for the suggestion, those sort of flow for LS work for smaller pipelines. Honestly, the best way is probably multi pipeline - but that doesn't work well at scale NOR does it work great when tons of pipeline's share the same code (like geo enrich). It reduces performance a lot. Not really a perfect answer either way, all a trade off - but I am open to other suggestions

neu5ron commented 3 years ago

re-open if still an issue, thanks!

Cyb3rWard0g / HELK