aschein / bptf

Bayesian Poisson tensor factorization
MIT License
57 stars 24 forks source link

Can tensor factorization be used for dimensionality reduction like non-negative matrix facotirzation? #1

Open cjnolet opened 6 years ago

cjnolet commented 6 years ago

Specifically, the use-case of dyadic events matches very closely to Netflow. I'm looking to do outlier detection in Netflow and would like to use a tensor factorization to reduce the dimensionality of the IPs and and ports (based on their counts) such that I could cluster them in a euclidean space (similar to matrix factorization).

Do you see this as a reasonable use-case for your tensor factorization algorithm?

aschein commented 6 years ago

Yes, this is an excellent use case. To do anomaly detection using Poisson factorization, you would fit the model to your data (call it Y), then compute the model's reconstruction of the data (call it M), and then compute the probability (under the Poisson distribution) of the data Pois(Y ; M). Any values of the data with low probability (for some threshold), given the model's reconstruction would then be considered anomalies.

cjnolet commented 6 years ago

@aschein

Thanks so much!

Just to reiterate (so I understand correctly), you are recommending to go through each predicted count of the reconstructed tensor (M) and perform the Poisson calculation using some lambda (as a hyperparameter) and the counts (as k), and choose a threshold that does a reasonable job at finding anomalies?

Intuitively, this would mean that items in the original tensor that may not have been outliers but didn't have enough strength in the latent components in comparison to the rest of the items have now become outliers as a result of becoming noise in the reconstruction?

cjnolet commented 6 years ago

Forgive my ignorance. Thinking more about this.

I suppose u are probably saying to normalize the reconstructed tensor counts into a Poisson distribution with lambda = mean count from M and then threshold for outliers.

aschein commented 6 years ago

@cjnolet I've added some simple code that shows how to use BPTF to detect anomalies: https://github.com/aschein/bptf/blob/master/code/anomaly_detection.py

Please let me know if it works well for your application! (And if you have any more questions, ask away!)

cjnolet commented 6 years ago

@aschein

Ah, I didn't realize I could create the Poisson PMF directly from Y and M. Your first response makes total sense now. I am still learning :-)

Thank you so much for the code example! It is very much appreciated.