How to run your code with SLEIPNIR dataset

ZaydH / MalwareGAN

Adversarial Malware Generator Using GANs

MIT License

52 stars 20 forks source link

How to run your code with SLEIPNIR dataset #4

Closed vietvo89 closed 3 years ago

vietvo89 commented 3 years ago

Hi Zay

I have got SLEIPNIR dataset from the author. But your sample code uses a data format differrent from SLEIPNIR dataset which consists of several individual files. So how can I run your malGAN with SLEIPNIR dataset?

Thanks

ZaydH commented 3 years ago

It has been a few years since I worked on this code, and I am going off of memory.

The basic idea is you need to convert the SLEIPNIR files into a NumPy ndarray tensor. I found the old code I believe I used and uploaded it to a gist for you. Please try that. You may need to modify it to make it work.

vietvo89 commented 3 years ago

Thank you so much. Let me try your code. But one more thing, if I train MalGAN and have a model, how can I use your code to generate malware to evaluate the success rate of your method against the black-box detector? Is it right if I only use the trained Generator to produce benign samples from malware?

ZaydH commented 3 years ago

I am not sure exactly what you mean. I will answer what is my best guess of what you mean. If this is off base, let me know.

The MalwareGAN code serial trains a blackbox detector (you can specify the type) as well as the GAN. I am not sure what you mean by "have a model". You could in theory replace my blackbox detector with your own if you wanted, but you would need to handle that integration.

To determine teh success rate as I did, I recommend splitting the training set into three parts: training, validation, and test. You use the training set to train the model (with validation for hyperparameter selection). Only then you use the held out test set to see how well your model performed on totally unseen data. This is the standard flow.

vietvo89 commented 3 years ago

Thank Zay.

I read other papers and they demonstrated how to do attack with GAN. But I want to double check with you that if I have trained GAN model, do I need Generator to attack or to make malware evade detectors? The flow may be feeding malware to the generator and then evaluate how its output evade the detector.

Thanks

ZaydH commented 3 years ago

Yes.

After you train the model, you take a new malware vector, run it through the generator. This will yield a new vector that should evade the detector. To verify your workflow, you can then run that modified vector though the detector to see if it is marked as clean. This secondary sanity check is clearly not possible in practice but works for scientific evaluation/debugging.