Open jiangdie666 opened 2 months ago
Sorry for also reporting the zero-dimensional tensor error while training the darpar dataset, have you had this problem while training the data?
I wonder if it's possible that my environment is not the same as yours, both of which I'm executing in the following environment Python 3.10.13 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 conda install -c dglteam/label/th21_cu121 dgl Because I didn't find a good Deep Learning version to install when I installed your environment requirements dgl=1.0.0. Or can you provide how you installed the 1.0.0 version of DGL?
I have tried to evaluate wget under your environment setting (pytorch==2.1.0 and dgl==2.0.0). I'm getting the same results as using dgl==1.0.0 both with and without the pre-trained pkls.
Did you obtain the graphs.pkl from parsing the raw logs or from the pkl provided by MAGIC?
I trained directly from your trained pkl this time and still had some problems with the results
What is your k (i.e. num_neighbors)? Using k == 1 on wget dataset could be the cause.
The "zero-dimension" error is simply a bug. Modifying loss, _ = model(g)
to loss = model(g)
fixes the bug.
Sorry for such a simple code question, I can't believe I forgot about it. I'm thankful that the training on the darpar data is now running successfully!
I didn't move k, but I think the code means that the default parameter is 2 if you don't change it.
Yes. I'm getting normal evaluation results when k == 2 but results like yours when k == 1.
I am very sorry, I tried to change the value of k, but the result is still strange, the score is a bit too weird.
If your graphs.pkl is not the provided one, make sure that node type in index 2 is 'task'.
I found the problem, before I said I used your original data, but I only used your own checkpoint.pkl, I forgot that graphs.pkl also comes with a graphs.zip compressed package, the following results are generated by the project's own checkpoint.pkl and my step-by-step data processed in accordance with the project from scratch own graphs.pkl.
Then I unzipped the graphs.zip and used the project's own pkl and its own checkpoint to achieve the expected results, I think it may be that there was a problem with the data processing in the beginning using the wget_parser.py script, or there was a problem with the call to load_rawdata function to generate the graphs.pkl which resulted in a result of There is a problem.
I'll look at it again myself, thanks for the reply
Does your version of graphs.pkl matches the size of the provided one? If not, what is your data source?
And most importantly, make sure that node type in index 2 is 'task', which is very important to the detection performance. If not, find the index for 'task' and modify line 28 of ./model/eval.py
to out = pooler(g, out, [index_for_task]).cpu().numpy()
I retrained the dataset and added the category display code to the wget_parese code kind and found that the task is indeed at index 3. So I changed the index of the eval code as you said, but the result is still incorrect and my graphs.pkl is exactly the same size as the graphs.pkl in your zip. Very strange.
Is it possible that the order of the raw logs is different, which results in incorrect labeling during loaddata and triggers the shift in node type indices as a byproduct?
I just tried the indexes 0-7, and it still didn't work well. I'll start with downloading the data in the afternoon and try building it again. It's a really strange problem.
I forgot if attack logs should be the first 25 or the last 25 logs to be parsed, but this absolutely matters.
Your comment woke me up to the fact that I've been so obsessed with the fact that it wasn't my environment or code manipulation that was at fault that I forgot if there was a problem in processing the dataset in the first place. I found that the original code used the ls function directly when processing the 150 graph data, which may have resulted in the first 25 logs that were not processed corresponding to the ATTACK data. So I modified the code to prove that this was the problem.
This was modified before running
Here's the modified run
So the task index changed also because the data wasn't processed properly, and ultimately the eval code doesn't need to change, which is 2.
Thank you so much for answering my questions over and over again! 谢谢
@jiangdie666 @Jimmyokok Hello. Thank you for sharing. I had the same problem.
I have another question. If I'm not wrong, in the original paper the results for wget were reported as follows:
I have done the Quick Evaluation and got the following results:
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] Loading processed wget dataset... [n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] AUC: 0.9359999999999999 F1: 0.9056603768600924 PRECISION: 0.8571428571428571 RECALL: 0.96 TN: 21 FN: 1 TP: 24 FP: 4
I also saw that the last results @jiangdie666 shared were close to mine. What might be the reason for the different results for Precision, F1, and AUC?
@jiangdie666 @Jimmyokok Hello. Thank you for sharing. I had the same problem.
I have another question. If I'm not wrong, in the original paper the results for wget were reported as follows:
I have done the Quick Evaluation and got the following results:
[n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] Loading processed wget dataset... [n_graph, n_node_feat, n_edge_feat]: [150, 8, 4] AUC: 0.9359999999999999 F1: 0.9056603768600924 PRECISION: 0.8571428571428571 RECALL: 0.96 TN: 21 FN: 1 TP: 24 FP: 4 #Test_AUC: 0.9360±0.0000
I also saw that the last results @jiangdie666 shared were close to mine. What might be the reason for the different results for Precision, F1, and AUC?
I have rerun the Quick Evaluation with exactly the same data, checkpoints and code as in this repository, which gives me this: AUC: 0.96 F1: 0.9599999994999999 PRECISION: 0.96 RECALL: 0.96 TN: 24 FN: 1 TP: 24 FP: 1
Then, I modified the code to repeat the evaluation with random seed 0 to 49 and report the average, which gives me this: AUC: 0.952864+0.013846093456278552 F1: 0.9595209114984354+0.016390904351784117 PRECISION: 0.9663880341880342+0.031628609309857315 RECALL: 0.9536+0.018521339044464343 TN: 24.14+0.8248636250920511 FN: 1.16+0.463033476111609 TP: 23.84+0.463033476111609 FP: 0.86+0.8248636250920512
This is extremely strange, since I have never seen as many as 4 FPs. Meanwhile, I'm sure I have n_neighbor == 2 which is standard setting, and I have tried these evaluations with PyTorch 1.x and 2.x respectively, which yield the same result.
With seed 2022, which aligns with the repository code, I'm getting this: AUC: 0.9616 F1: 0.9795918362349021 PRECISION: 1.0 RECALL: 0.96 TN: 25 FN: 1 TP: 24 FP: 0
其实我用原始 您xiang项目自带的那个checkponti-wget.pt FP的结果也依旧为4,远不及您上面的结果
尝试一下多个种子平均?说不定2022在其他设备上正好表现非常差?
用项目自带的 graghs.zip解压后的图处理文件 graphs.pkl结果是满足的,那就说明依旧是预处理wget的数据代码的部分有小bug。
I have evaluated the original dataset using your project's code to train the generated pkl, and also the trained pkl that comes with your project, respectively. But the results are not satisfactory, is it because I didn't set other parameter details. Raw data evaluation results from my own training
This is the pkl that comes with your project.