Open MeNicefellow opened 3 years ago
This is because these representation models are already taking GGNN output as their input. Maybe the README is a bit misleading, as it implies that it goes from start to end (feature extraction -> GGNN -> representation learning). However, those scripts only run the last step (representation learning), and the GGNN output is already present in the data they provide through Google Drive.
I noticed that in main of attention_main.py of Vuld_SySe dir. There is a code
elif args.job == 'generate':
test_data = json.load(open(args.test_file))
for e in test_data:
entry = DataEntry(dataset, e['code'], e['label'])
dataset.add_data_entry(entry, train_example=False)
dataset.init_data_set()
model_file = open(args.model_path, 'rb')
model = torch.load(model_file)
if args.cuda_device != -1:
model.cuda(device=args.cuda_device)
embeddings = generate_embeddings(
model=model, dataset=dataset, output_path=args.test_output_path, cuda_device=args.cuda_device)
I presume generate_embedding
function is used to generate graph embedding, and the code just load from pre-trained model without telling you what the model look like.
According to the README file, to try ReVeal pipeline as a whole, you should run "bash run_verum.sh" or "bash run_devign.sh". The running model is MetricLearningModel in models.py, which is not GGNN-based, but the pipeline in the paper is a GGNN.