Task 2 option 2 breakdown and discussion

cjshearer commented 4 years ago

I spent a few hours this past Thursday coming up with a general breakdown of how I think we (Team Ares) should approach task 2, but I thought I should share it with others to get some feedback and maybe provide some direction for anyone feeling lost. Note that this was originally written with my team in mind and has some specific direction for them, hopefully that is not too distracting (or perhaps it may even be helpful). @MENG2010 If you have the chance, I would specifically like your feedback on whether the scope of this plan is appropriate for this task, although any feedback is appreciated.

Informal Breakdown of Our Approach

I've broken task 2 down into three steps. The first is to create some training data, from which we can learn an ensemble strategy. The second is to select a learning model, then train and test variations of that model. The third is to summarize our results and approach in the report. Throughout the entire process and for each step you complete, write down a summary of what you did and what you learned (where necessary). This will save us a lot of time when writing the report. Also, keep a high-level list of contributions you make to the task; it is your responsibility to make sure you get credit for your work.

generate training/test data from Athena as (X,Y), where:
1. X is a set of predictions from athena
  1. I'm thinking we use the adversarial examples provided in the /data/ folder. This will require understanding how the AEs are stored in the .npy files.
  2. Once we understand how to use the AEs from the .npy files, we should select a subset of them, ensuring we do not introduce bias into our model by using too many of any one type of AE. We may even decide to include some benign samples in the mix of training/testing data. Whatever we use, let's be sure to use equal amounts of AEs from each attack type.
2. Y is a set of true labels that matches with our selected AEs
  1. We should find out how to get predictions from a weak defense(WD) (see below table for an example of what I mean).
  2. We then need to select a set of WDs (or just use all of 72 of them) to create training data

Once we are done with this first part, we should have data that looks something like this table

AE_id	WD1	WD2	...	Y
1	[0.1,0.4,0,0,...,0.1]	[0.1,0,0.8,0,...,0.1]	...	3
2	[0.6,0,0,0.3,...,0.1]	[0,0,0.3,0.1,...,0.2]	...	1
...	....	...	...	...
n	[0.1,0.4,0,0,...,0.1]	[0.1,0,0.8,0,...,0.1]	...	9

where:

where AE_id is just an index we associate with the training sample
WD# is the nth weak defense we choose to use
predictions from the weak defenses (e.g [0.1,0.4,0,0,...,0.1]) are a probability distribution from 0 through 9, where the WD assigns a probability (makes a prediction) that a particular number is the true label.
Y is the true (correct) label for that AE or benign sample

Learn an ensemble strategy
1. We need to select a ML model for learning. Logistic regression is the most basic form of categorical prediction model, but we could also use some type of decision tree. Whatever we use, let's stick to that one model.
2. We should then decide what metrics we wish to track during training and find out how to do this with Keras. We must at the very least track model accuracy (loss) over time.
  1. Run through these tutorials (no setup required, tutorials can be run in google collab).
  2. Once you've finished some of the basic tutorials (it's up to you to figure out how much you will need), figure out how to track model loss (accuracy) over each training iteration for one of the example models you create. Figure out how to save this and any other metrics you use to a csv file, where each row is a training step and each column is a metric (like model loss). In addition, figure out how to plot the data in the csv.
3. Whatever model we select, train a few (3 to 5) variations of the ML model by changing the hyper-parameters that we could use for that model. Note that this selection of variations won't occur simultaneously. Note also that we should separate the data we generated in step 1 into a training and a testing set. How we separate this (e.g. 80% training 20% testing) might depend on the model we choose.
  1. Select a set of hyper-parameters
  2. Train the model with that selection of hyper-parameters, tracking the model loss (and any other selected metrics) over each training iteration. You will store these metrics in a csv file, where each row is a training step and each column is a metric (like model loss).
  3. As you train the model, print the model loss after every iteration. A successful hyper-parameter selection will result in the model loss decreasing after each training step. If it doesn't decrease or stops decreasing, stop the training. Either the selection of hyper-parameters is bad and your loss won't converge, or your model has already converged and further training won't help.
  4. After your model has converged, plot the resulting metrics.
  5. Save the plots, the learned weights/training parameters, and the metrics you tracked during training. Please name these intelligently (include the name of the model, and any relevant hyper-parameters in the file name), any member of the group should be able to understand what the contents of the file are without opening them.
  6. Place the files in their respective folders in Task2/. If a file doesn't appear to fit in any folder, ask in the discord chat where to put it.
  7. Repeat the above steps another 2 to 4 times, adjusting the hyper-parameters you select. Selection of these hyper-parameters is uncertain, so you will need to use any knowledge and experience you have to adjust these, based on the results you get.
Write the report. This should detail your approaches, results, what you learned, conclusions, etc. Imagine telling yourself what you wish you knew before starting the task. If you kept a summary of what you did and what you learned (as I suggested from the beginning) this should be easy.
1. Introduce the approaches that are used in the task.
2. Experimental settings --- the values of the tuneable parameters for each variant.
3. Evaluation and necessary analysis.
4. Contribution of individual team members.
5. Citations to all related works.

MENG2010 commented 4 years ago

I think it is a good plan for task 2 option 2.

The predict function of the Ensemble class (in athena.py) can give you the matrix of raw predictions directly (something you want for the first part).

e.g., get raw predictions

ensemble = Ensemble(classifiers=wds, strategy=ENSMEBLE_STRATEGY.AVEP.value)
raw_prediction = ensemble.predict(x, raw=True)

if you need the final prediction (a vector of probabilities for each class. e.g. [0.1, 0.9, 0, 0, 0, 0, 0, 0, 0, 0]))

ensemble = Ensemble(classifiers=wds, strategy=ENSEMBLE_STRATEGY.AVEP.value)
prediction = ensemble.predict(x)

pooyanjamshidi commented 4 years ago

@cjshearer very good summary, I loved the details. Our thoughts were to learn some weights associated with each WD with some approaches such as AMC-SSDA. Where this lies in your plan?

MENG2010 commented 4 years ago

One thing I would like to mention here, you need to train your model on the training data and test it on the test data. Do not train and test the model on the same dataset.

If you want to train on a data set that mixes both benign samples and adversarial examples, I would suggest you separate the AEs into 2 portions (say, 80% for training and 20% for testing).

subsampling function in utils.data can help you to separate the dataset into 2 independently-identical-distribution (i.i.d.) portions (i.e., the two subsets have identical distributions), with minor updates (modifications in utils.data.subsampling):


# shuffle the selected ids
random.shuffle(sample_ids)
# get sampled data and labels
subsamples = np.asarray([data[i] for i in sample_ids])
sublabels = np.asarray([labels[i] for i in sample_ids])

# insert the following statements
# store the rest samples in a separated array
notselected_samples = np.asarray([data[i] for i in range(pool_size) if i not in sample_ids])
notselected_labels = np.asarray([labels[i] for i in range(pool_size) if i not in sample_ids])

# save all the subsets
# ...

Then, you can get your training & testing data by subsample from a 10K dataset with a ratio equals 0.2 (subsamples is the testing data and notselected_samples is the training data) or 0.8 (subsamples is the training data and notselected_samples is the testing data).

MENG2010 commented 4 years ago

Check the tutorial Task2_LearningBasedStrategy for more information.

cjshearer commented 4 years ago

Thanks for the advice and instruction @MENG2010. We will be sure to properly separate the data into a test and training set.

@pooyanjamshidi for now, we are just learning a simple, 3 layer model (see below). The training/testing data needed for 16 models x 19 transformations of the MNIST set x 35MB per 10k predictions = 10.64GB. Compression brought it down to 126MB, but storing/sharing larger datasets, as would be required for the AMC_SSDA approach, would take far more space than would be reasonable to store on GH (without activating git-lfs, which is disabled for public forks). Other solutions I've found are either paid, or would take too long to setup.

model = keras.models.Sequential([
    keras.layers.InputLayer(input_shape=(wds._nb_classifiers, 10), name='WD_layer'),
    keras.layers.Flatten(),
    keras.layers.Dense(units=100, activation='relu', name='D1'),
    keras.layers.Dense(10, name='output_layer', activation='softmax')
])

MENG2010 commented 4 years ago

Task 2 has been submitted.

csce585-mlsystems / project-athena

Task 2 option 2 breakdown and discussion #25

Informal Breakdown of Our Approach