VulDetProject / ReVeal

MIT License
180 stars 61 forks source link

Reveal pipeline is not GGNN based? #7

Open MeNicefellow opened 3 years ago

MeNicefellow commented 3 years ago

According to the README file, to try ReVeal pipeline as a whole, you should run "bash run_verum.sh" or "bash run_devign.sh". The running model is MetricLearningModel in models.py, which is not GGNN-based, but the pipeline in the paper is a GGNN.

davidhin commented 3 years ago

This is because these representation models are already taking GGNN output as their input. Maybe the README is a bit misleading, as it implies that it goes from start to end (feature extraction -> GGNN -> representation learning). However, those scripts only run the last step (representation learning), and the GGNN output is already present in the data they provide through Google Drive.

for-just-we commented 2 years ago

I noticed that in main of attention_main.py of Vuld_SySe dir. There is a code

    elif args.job == 'generate':
        test_data = json.load(open(args.test_file))
        for e in test_data:
            entry = DataEntry(dataset, e['code'], e['label'])
            dataset.add_data_entry(entry, train_example=False)
        dataset.init_data_set()
        model_file = open(args.model_path, 'rb')
        model = torch.load(model_file)
        if args.cuda_device != -1:
            model.cuda(device=args.cuda_device)
        embeddings = generate_embeddings(
            model=model, dataset=dataset, output_path=args.test_output_path, cuda_device=args.cuda_device)

I presume generate_embedding function is used to generate graph embedding, and the code just load from pre-trained model without telling you what the model look like.