Thijsvanede / DeepCASE

Original implementation and resources of DeepCASE as in the S&P '22 paper
MIT License
90 stars 26 forks source link

I want to ask lastline dataset #5

Closed gen3111620 closed 2 years ago

gen3111620 commented 2 years ago

Hi, I have some question about your lastline dataset description and model input

In your paper, I saw you describe dataset for 291 unique types of security events, and 7.8M events were used to give security operators additional information...

  1. May I ask this additional information is in 291 security events or not? , if not, can give some example let me know ?
  2. Is model input sequence group by from user behavior (like hdfs dataset session-based or all system log message (no session, just sort value by time)?

thanks.

Thijsvanede commented 2 years ago

I am not sure whether I understand your questions correctly, but let me try to answer them:

  1. Descriptions of the 291 different types of security events can be found here: https://github.com/Thijsvanede/DeepCASE/tree/main/mapping#events
  2. The input sequences are grouped by the machine for which the alert was produced (using the IP address to group machines) and then sorted by time. So a sequence of alerts is from a single machine only, but can be produced by different processes or over multiple sessions.

I hope that clarifies the dataset a bit. Please let me know if anything is unclear

gen3111620 commented 2 years ago

thanks a lot, I have seen this events mapping in your git In this paper work, is your input sequence is only 291 security events? not have normal log message (Others events log)?

Thijsvanede commented 2 years ago

The input sequences for our work consist of the latest 10 security events at each timestep. Where each security event can be one of the aforementioned 291 events. So the events that we are working with are not plain log messages, but instead security events generated by e.g., an intrusion detection system (IDS) or network security monitor (NSM) that analyses these log messages. This is illustrated in Figure 1 of our paper: https://vm-thijs.ewi.utwente.nl/static/homepage/papers/deepcase.pdf