ZJU-CTAG / PDBERT

The replication package of paper "Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks"
16 stars 0 forks source link

README

This is the replication package of paper "Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks".

Our datasets and online appendix can be found here.

Requirements

Softwares

Hardwares (opt.)

Organization of the Replication Package

How to Run

Intrinsicn Evaluation

To obtain the results of Table 1 & Table 2 in our paper.

  1. Go to the pretrain folder (This is important for relative path retrieving).
  2. For partial code intrinsic evaluation results in Table 1, run: python eval_partial_func_pdg.py
  3. For full function only intrinsic evaluation results in Table 2, run: python eval_full_func_pdg.py

Note

Extrinsic Evaluation

We use three vulnerability analysis tasks for extrinsic evaluation: vulnerability detection, vulnerability classification and vulnerability assessment.

Preparation

To make training and testing as a unified pipeline, you should open downstream/global_vars.json to make some configurations. In detail, the key of the object in downstream/global_vars.json should be the name of your machine (run Python command import platform; print(platform.node()) to check), and the python_bin should be the path your Python binary located.

Vulnerability Detection

  1. Go to downstream folder (This is important for relative path retrieving).
  2. For three datasets, run:
    • ReVeal: python train_eval_from_config.py -config configs/vul_detect/pdbert_reveal.jsonnet -task_name vul_detect/reveal -average binary
    • Devign: python train_eval_from_config.py -config configs/vul_detect/pdbert_devign.jsonnet -task_name vul_detect/devign -average binary
    • BigVul: python train_eval_from_config.py -config configs/vul_detect/pdbert_bigvul.jsonnet -task_name vul_detect/bigvul -average binary

CWE Classification

  1. Go to downstream folder (This is important for relative path retrieving).
  2. Run python train_eval_from_config.py -config configs/cwe_class/pdbert.jsonnet -task_name cwe_class -average macro -extra_averages weighted

Vulnerability Assessment

  1. Go to downstream folder (This is important for relative path retrieving).
  2. Run python train_eval_multi_task_from_config.py -config configs/vul_assess/pdbert.jsonnet -task_name vul_assess -extra_eval_configs "{\"task_names\":\"CPL,AVL,CFD,ITG\"}" -eval_script eval_multi_task_classification -average macro -extra_averages weighted

Note: