DukeStatSci / thesis-sp18-wu-anomalydet

Undergraduate honors thesis of James Wu, to be submitted in Spring 2018
https://dukestatsci.github.io/thesis-sp18-wu-anomalydet/
1 stars 2 forks source link

Mid-semester check-in #2

Open mine-cetinkaya-rundel opened 6 years ago

mine-cetinkaya-rundel commented 6 years ago

@jamescwu Please respond to the following questions in a reply to this issue by Wednesday Oct 18, 5pm. Brief answers are ok.

  1. What is the ultimate goal of the thesis?
  2. What portion of the thesis work have you completed so far?
  3. What parts of the analysis do you plan on completing by the end of the semester?
  4. What sections do you expect to include in your Fall semester write up? (Note that this is due as a gitbook and PDF output on Dec 7 at 9am, I'll take a snapshot of your repo at that time.)
ghost commented 6 years ago
  1. The ultimate goal of the thesis is to attempt novel methods for anomaly detection specifically for network IP data, which I received from Duke OIT (all IP addresses have been anonymized). After assessing the best methods I will attempt to implement the best methods to be computationally efficient (we are shooting for a classification time of under 5 seconds for a potential malicious attack) and abstractable to other datasets.
  2. Thus far I have completed the exploratory data analysis, feature exploration in the context of the dataset, conducted some literature review on the current methods being used and the application of our algorithms to other problems, implemented a port matrix transformation to apply our methods to the dataset, and setup the pipeline for testing potential kernels for kernel PCA.
  3. I hope to complete the kernel PCA analysis for a few simple PCA kernels, and implement naive (computational efficiency not considered) singular value decomposition for imputing missing values in the transformed ports matrix using multiplicative models.
  4. I expect to include a literature review of currently used methods, my exploratory data analysis with figures, my variable summary for the featureset, and my implementation and visualizations for the naive kernel PCAs as well as a discussion of the results. I will hopefully also be able to include the code for my singular value decomposition and an explanation of my SVD findings.