Regarding data preprocessing issues

FDUDSDE / MAGIC

Codes and data for USENIX Security 24 paper "MAGIC: Detecting Advanced Persistent Threats via Masked Graph Representation Learning"

MIT License

64 stars 10 forks source link

Closed sorrowwwww closed 8 months ago

sorrowwwww commented 8 months ago

Dear Sir, I encountered two issues while training your code from scratch:

When preprocessing the trace, the trace_parser.py requires trace.txt, theia.txt, and cadets.txt to read malicious entities, but I do not know where to obtain these files.
When reproducing the wget experiment and preprocessing the data, it seems that the feature dimension of nodes is 14, meaning that there are 14 types of nodes. However, t the pre-trained model provided by you has a node dimension of 8.

Jimmyokok commented 8 months ago

Go to ThreaTrace and you will find these .txt ground truth files in the folder "groundtruth".
Our preprocessing steps, whose code is provided in _wgetparser.py, use only 8 of all node types, focusing on process(task) and including file(file/path), network connections(address/socket/link) and some memory-related node types(process_memory/mmaped_file): valid_node_type = ['file', 'process_memory', 'task', 'mmaped_file', 'path', 'socket', 'address', 'link']

sorrowwwww commented 8 months ago

Thank you for your patient answer. I have successfully resolved the issue.