Question about the cluster

1ncludeSteven commented 9 months ago

I recently had the chance to delve into your paper and replicate the findings using the data provided in DeepLog. I was intrigued by the process and have a couple of inquiries that I hope you could help me with:

Regarding the Context Builder stage, I wanted to clarify if providing only the timestamp, event, and machine for each warning is sufficient. Are additional details like the source IP, destination IP, etc., unnecessary for this stage?
Concerning the clustering process, I'm curious if the clustering is primarily based on the attention vector derived from each event. If that's the case, do the events within the same cluster exhibit similarities akin to BitCoinMiner? Or, are there instances where some events are somewhat similar to BitCoinMiner?

Your insights on these points would be immensely valuable. Thank you very much for your time and consideration.

Thijsvanede commented 9 months ago

Of course:

Yes, the timestamp, event and machine are sufficient for DeepCASE to perform the analysis process. However, if security operators manually analyze the event sequences, it may be useful to them to have auxiliary information such as src/dst IP. However, for the process that DeepCASE performs, this is not required.
The clustering is indeed based on the attention vectors of each event. In fact, if we have multiple instances of the same event in the sequence, we use the combined attention of these events. To illustrate, consider a sequence A B B with attention values 0.2 0.3 0.5, then the vector used for clustering looks like this 0.2 0.8 (or in other words A: 0.2 B: 0.3 + 0.5 = 0.8. Regarding the case for BitCoinMiner, I cannot comment on specific instances due to our NDA. I can say that we observed many clusters that all had pretty much the exact same sequence/attention and some clusters where there was some variance in the sequences.

I hope this answers your question.

1ncludeSteven commented 9 months ago

For question 2, do events in the same cluster have similar attention values? Does similar attention value mean that the events in it are similar? For example, it may be Phishing: Financial Sector, Phishing: Payment Service, Phishing: Social Networking and Phishing: e-Commerce in the same cluster because these events are similar. I'm curious about what type of events are in the same cluster.

Thijsvanede commented 9 months ago

No, as of right now, all event types are treated as being completely different from one another. That means that two clusters

Sequence 1 with 0.3 attention to Phishing: Financial Sector and 0.7 attention to Phishing: Payment Service ; and
Sequence 2 with 0.3 attention to Phishing: Social Networking and 0.7 attention to Phishing: e-Commerce will be considered to have no overlap.

It can be the case however that events may be clustered together in some cases, if there is enough overlap in the attention of the same events. Consider the following case (I abbreviate the phishing examples to fin, pay, soc, and com):

Context 1: fin, fin, fin, pay, Event pay and attention vector [0.03, 0.03, 0.03, 0.91] (corresponding to context)
Context 2: com, com, com, pay, Event pay and attention vector [0.03, 0.03, 0.03, 0.91] (corresponding to context) In this case the first sequence is represented as fin:0.09, pay:0.91 and the second sequence as com:0.09, pay:0.91 which has a big overlap on the pay and may therefore be clustered together.

1ncludeSteven commented 9 months ago

Thank you for your response, it was helpful!!

Thijsvanede / DeepCASE

Question about the cluster #9