Does the dataset in the raw folder need to be the original Jsonl dataset or does the CPG structure need to be extracted as well? Is it mandatory to include 'edge_type'?

Vul-LMGNN / vul-LMGGNN

Code for the paper - Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graph

Apache License 2.0

42 stars 10 forks source link

Column Non-Null Count Dtype

0 target 497 non-null int64 1 func 497 non-null object 2 Index 497 non-null int64 3 cpg 497 non-null object dtypes: int64(2), object(2) memory usage: 439.0 KB CPG cut - original nodes: 235 to max: 205 CPG cut - original nodes: 237 to max: 205 CPG cut - original nodes: 229 to max: 205 CPG cut - original nodes: 206 to max: 205 Traceback (most recent call last): File "run.py", line 190, in Embed_generator() File "run.py", line 88, in Embed_generator cpg_dataset["input"] = cpg_dataset.apply(lambda row: process.nodes_to_input(row.nodes, row.target, context.nodes_dim, File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 9423, in apply return op.apply().finalize(self, method="apply") File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/apply.py", line 678, in apply return self.apply_standard() File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/apply.py", line 798, in apply_standard results, res_index = self.apply_series_generator() File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/apply.py", line 814, in apply_series_generator results[i] = self.f(v) File "run.py", line 88, in cpg_dataset["input"] = cpg_dataset.apply(lambda row: process.nodes_to_input(row.nodes, row.target, context.nodes_dim, TypeError: nodes_to_input() missing 1 required positional argument: 'edge_type' root@autodl-container-b7ed44ad80-2b59cbac:~/autodl-tmp/vul-LMGGNN#

Have you fixed the error yet? i also got the same error due to missing 'keyed_vector'

Type help or browse(help) to begin joern> Would you like to save changes? (y/N)

Dataset chunk 0 not processed. <class 'pandas.core.frame.DataFrame'> Index: 497 entries, 2 to 3330 Data columns (total 4 columns):

Column Non-Null Count Dtype

0 target 497 non-null int64 1 func 497 non-null object 2 Index 497 non-null int64 3 cpg 497 non-null object dtypes: int64(2), object(2) memory usage: 439.0 KB CPG cut - original nodes: 235 to max: 205 CPG cut - original nodes: 237 to max: 205 CPG cut - original nodes: 229 to max: 205 CPG cut - original nodes: 206 to max: 205 Traceback (most recent call last): File "run.py", line 190, in Embed_generator() File "run.py", line 88, in Embed_generator cpg_dataset["input"] = cpg_dataset.apply(lambda row: process.nodes_to_input(row.nodes, row.target, context.nodes_dim, File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 9423, in apply return op.apply().finalize(self, method="apply") File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/apply.py", line 678, in apply return self.apply_standard() File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/apply.py", line 798, in apply_standard results, res_index = self.apply_series_generator() File "/root/miniconda3/lib/python3.8/site-packages/pandas/core/apply.py", line 814, in apply_series_generator results[i] = self.f(v) File "run.py", line 88, in cpg_dataset["input"] = cpg_dataset.apply(lambda row: process.nodes_to_input(row.nodes, row.target, context.nodes_dim, TypeError: nodes_to_input() missing 1 required positional argument: 'edge_type' root@autodl-container-b7ed44ad80-2b59cbac:~/autodl-tmp/vul-LMGGNN#

Have you fixed the error yet? i also got the same error due to missing 'keyed_vector'

Vul-LMGNN / vul-LMGGNN

Does the dataset in the raw folder need to be the original Jsonl dataset or does the CPG structure need to be extracted as well? Is it mandatory to include 'edge_type'? #5

Column Non-Null Count Dtype

Column Non-Null Count Dtype