chetanniloor / De-pois-An-Attack-Agnostic-Defense-against-Data-Poisoning-Attacks

The most critical downside to AI is that its inefficiency is directly related to its data quality. Presently there are very few methods available that are used to protect the data from being attacked in the real time applications. But there is no commonplace in terms of defence technology that is being used for various types of attacks, in these situations the need for a generic method is highly needed for defending against those poisoning attacks, so we have come up with the defence technique that is called as De-Pois Defence method
3 stars 0 forks source link

De-pois-An-Attack-Agnostic-Defense-against-Data-Poisoning-Attacks

The most critical downside to AI is that its inefficiency is directly related to its data quality. Presently there are very few methods available that are used to protect the data from being attacked in the real time applications. But there is no commonplace in terms of defence technology that is being used for various types of attacks, in these situations the need for a generic method is highly needed for defending against those poisoning attacks, so we have come up with the defence technique that is called as De-Pois Defence method Python Packages Dependencies: Numpy 1.21.6 Pandas 1.3.5 Python 3.7.13 Pytorch 1.11.0+cu113 Keras 2.8.0 Scikit-learn 1.0.2 Scipy 1.4.1 art 5.6

Run the code and Applications Run the code on the Google Colab . So you do not need to install any type of application . Add all the .py files in the google drive along witH the MINST Dataset , then login Google Colab using the gmail account and access your saved files form the drive by providing the file path as an input . Firstly execute Generator_CGAN_authen.py where in you have to change the accessed files path like MNIST dataset. Then go ahead and run the Mimic_model_construction.py file similarly . Lastly execute the Main.py file to get the respective results. Also execute Mnist_direct.py or Mnist_generative.py to generate the Poisoined samples required to run the code.

Dataset: MNIST Datasbase - handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

Train and test datasets are found in the following link http://yann.lecun.com/exdb/mnist/

Modelling

De-Pois Design Process involves three stages

Generator_CGAN_authen.py : This stage involves two steps cGAN based generator : Generates sufficient synthetic training data with a similar distribution of Clean Dataset. Authenticator : It uses the newly generated samples at each iteration as instances of missing latent variables which lie in the space of training data, thus it make sure that the generated data can be trusted and are different from each other Mimic_model_construction.py : This script aims to imitate the target model and using the constructed mimic model we distinguish the poisoined samples from the clean ones Main.py : Where in we employ a detection boundary to set apart the poisoned samples from clean ones. The poison rate is set as 0.3 in our case. If the mimic model’s output is lower than our detection boundary, the sample is then regarded as being poisoned. Poisoned Data is generated by using either of the mnist_direct.py and mnist_generative.py.

References

De-pois Research paper :https://arxiv.org/pdf/2105.03592.pdf Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks : https://arxiv.org/abs/1511.06434 MNIST Database : http://yann.lecun.com/exdb/mnist/ https://secml.readthedocs.io/en/stable/tutorials/07-NeuralNetworks-MNIST.html For generating poisoning data using GP-attack: https://github.com/yangcf10/Poisoning-attack