Results about dataset E3-thiea

FDUDSDE / MAGIC

Codes and data for USENIX Security 24 paper "MAGIC: Detecting Advanced Persistent Threats via Masked Graph Representation Learning"

MIT License

64 stars 10 forks source link

Results about dataset E3-thiea #14

Closed Acomand closed 3 months ago

Acomand commented 3 months ago

I'd like to know the test set of dataset thiea.

If I use the graph you provided, I can get the results.

But if I run the program as "python trace_parser.py --dataset theia", the generated file "test0.pkl" is different from the one provided by you. And the result is far from the previous one (regardless of using the checkpoint you provided or retraining the model).

So I'd like to know, is "ta1-theia-e3-official-6r.json.8" the test set? (I have also tried to replace the test set with other files, but the results are also bad). Could you please rerun the trace_parser.py to check the code?

(The results in trace and cadets do not have problem)

Thanks a lot.

zmkzmkzmkzmkzmk commented 3 months ago

Do you have any problems reproducing the results of the cadets data set? After I processed the data and used it for model training, I got the following results during verification:

I repeated the experiment five times and the results were similar.

Acomand commented 3 months ago

When I use the dataset E3-cadets, I can get the result which is similar to yours.

But I can not reproduce the result on the dataset E3-theia. I run the code as ` cd utils && python trace_parser.py --dataset theia

cd ..

python train.py --dataset theia

python eval.py --dataset theia ` And the result is

Jimmyokok commented 3 months ago

The correct way: Put ta1-theia-e3-official-6r.json to ta1-theia-e3-official-6r.8.json (including ALL of 0-8) into the data directory and the parser will read entities from 0-8, parse 0-3 into training graphs and 8 into the test graph. I tried to parse E3-THEIA without 4-7 present, resulting in many entities being lost, including several thousand malicious entities, because they are simply defined in 4-7. Additionally, the peak detection threshold is pre-defined in the eval script. If you are training and detecting from scratch, you could adjust that threshold to make the confusion matrix looks normal, or simply refer to the AUC for threshold-insensitive evaluation.

Jimmyokok commented 3 months ago

Do you have any problems reproducing the results of the cadets data set? After I processed the data and used it for model training, I got the following results during verification:

I repeated the experiment five times and the results were similar.

Similar to above, the printed out confusion matrix could be quite different when the threshold varies. Please refer to the AUC for threshold-insensitive evaluation. Meanwhile, the n_neighbors (i.e. k) also affects detection performance and it is more sensitive on E3-CADETS than others. We are also planning to release a new version of MAGIC, which is more stable, more efficient and easier to reproduce from scratch.

Jimmyokok commented 3 months ago

Do you have any problems reproducing the results of the cadets data set? After I processed the data and used it for model training, I got the following results during verification:

I repeated the experiment five times and the results were similar.

For reference, I repeated the evaluation with two different k values, where k = 100 yields AUC= 0.9933 and k = 400 yields AUC=0.9968.

Acomand commented 3 months ago

It works, thanks a lot